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Preface 



This book is an introduction to game theory from a mathematical perspective. 
It is intended to be a first course for undergraduate students of mathematics, 
but I also hope that it will contain something of interest to advanced students 
or researchers in biology and economics who often encounter the basics of game 
theory informally via relevant applications. In view of the intended audience, 
the examples used in this book are generally abstract problems so that the 
reader is not forced to learn a great deal of a subject - either biology or eco- 
nomics - that may be unfamiliar. Where a context is given, these are usually 
“classical” problems of the subject area and are, I hope, easy enough to follow. 

The prerequisites are generally modest. Apart from a familiarity with (or 
a willingness to learn) the concepts of a proof and some mathematical nota- 
tion, the main requirement is an elementary understanding of probability. A 
familiarity with basic calculus would be useful for Chapter 6 and some parts of 
Chapters 1 and 8. The basic ideas of simple ordinary differential equations are 
required in Chapter 9 and, towards the end of that chapter, some familiarity 
with matrices would be an advantage - although the relevant ideas are briefly 
described in an appendix. 

I have tried to provide a unified account of single-person decision problems 
(“games against nature”) as well as both classical and evolutionary game the- 
ory, whereas most textbooks cover only one of these. There are two immediate 
consequences of this broad approach. First, many interesting topics are left out. 
However, I hope that this book will provide a good foundation for further study 
and that the books suggested for further reading at the end of this volume will 
go some way to Ailing the gaps. Second, the notation and terminology used 
may be different in places from that which is commonly used in each of the 
three separate areas. In this book, I have tried to use similar (combinations of) 




VI 



Preface 



symbols to represent similar concepts in each part, and it should be clear from 
the context what is meant in any particular case. 

If time is limited, lecturers could make selections of the material according 
to the interests and mathematical background of the students. For example, 
a course on non-evolutionary game theory could include material from Chap- 
ters 1, 2, and 4-7. A course on evolutionary game theory could include material 
from Chapters 1, 2, 4, 8, and 9. 

Finally, it is a pleasure to thank Vassili Kolokoltsov, Hristo Nikolov, and two 
anonymous reviewers whose perceptive comments have helped to improve this 
book immeasurably. Any flaws that remain are, of course, the responsibility of 
the author alone. 



Nottingham James Webb 

May 2006 
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Part I 



Decisions 




1 

Simple Decision Models 



1.1 Optimisation 

Suppose we are faced with the problem of making a decision. One approach to 
the problem might be to determine the desired outcome and then to behave in 
a way that leads to that result. This leaves open the question of whether it is 
always possible to achieve the desired outcome. An alternative approach is to 
list the courses of action that are available and to determine the outcome of each 
of those behaviours. One of these outcomes is preferred because it is the one that 
maximises^ something of value (for example, the amount of money received). 
The course of action that leads to the preferred outcome is then picked from the 
available set. We will call the second approach “making an optimal decision”. In 
this book, we will develop a mathematical framework for studying the problem 
of making an optimal decision in a variety of circumstances. 

Finding the maximum of something is a familiar procedure in basic calculus. 
Suppose we are interested in finding the maximum of some function, let’s call 
it f{x). We differentiate / and set the result equal to zero. A solution of this 
equation gives us one or more values of x at which a maximum is attained, 
which we might call x* . The maximum value of the function is then f(x*). (We 
must also check that the value of the second derivative of / to make sure that 
f{x*) is really a maximum.) 

In basic calculus, it is usually assumed (often without being mentioned) that 



^ In some situations, the aim may be to minimise a loss. In this case, we can still 
talk abont maximisation by considering the negative of the loss as the ontcome. 
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the function we wish to maximise is defined for all real values of x {i.e., x £ K) 
and is continuous. However, we will encounter problems in which one or both of 
these assumptions are untrue: the function /(x) may only be defined for values 
of X in a compact set X; the set X may be discrete; or the function being 
maximised may be discontinuous by definition. These distinctions can have 
important consequences, as is shown by the following example and exercises. 



Example 1.1 

Consider the function /(x) = i/x- H this function is defined for all x € [0, oo), 
then it keeps increasing as x increases and so it does not have a maximum. 
However, if the function is only defined for x € [0,4], then the function does 
have a maximum. The maximum value of / is attained at one boundary x* = 4 
and f(x*) = 2. 

Exercise 1.1 

Maximise F{n) = 1 — + |n where n is integer. (Remember that n* 

must be an integer.) 

Exercise 1.2 

Let a, b and c be positive constants. Let 

lo 

and 

x(*) = I J 

Maximise g{x) = axf{x) — cx(x). 

We will often wish to focus on the value of x at which the maximum is 
achieved rather than the maximum value of the function itself, so we introduce 
a new symbol argmax. 



otherwise 

if X > 0 
if X < 0 



Definition 1.2 

Suppose X is an arbitrary member of some set X. Let /(x) be some function 
that is defined Vx G X. Then the symbol argmax is defined by the following 
equivalence. 



X* G argmax /(x) 

kGX 



/(x*) = max/(x) . 
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That is, X* is a value that maximises the function f{x). Note that we do not 
write X* = argmax,„gx because a function may take its maximum value 
for more than one element in the set X. Because the symbol argmax returns 
a set of values rather than a unique value, it is called a correspondence rather 
than a function. 

Example 1.3 

Consider the function defined on a: G [—2,2] by f{x) = x“^ . This function 
achieves its maximum at x* = ±2. So 

argmax = {+ 2 , — 2 } . 

a:G[-2,2] 



Exercise 1.3 

(a) Let f{x) = 1 + 6a; — be defined Va; G K. Find argmax^j.^^ f{x). 

(b) Let f{x) = l + 6x — be defined Vx G [1, 2]. Find argmax,j,g[^ 2] fi^)- 

(c) Let /(x) = (1 — x)^ be defined Vx G [0,3]. Find argmax,j,g [q 3j /(x). 



1.2 Making Decisions 

The simplest case to consider is when there is no randomness in the environment 
- once a choice has been made, the outcome is certain. To begin to build a 
theory of optimal decisions, we make the following definitions. 

Definition 1.4 

A choice of behaviour in a single-decision problem is called an action. The set 
of alternative actions available will be denoted A. This will either be a discrete 
set, e.g., {01,02,03 , . . .}, or a continuous set, e.g., the unit interval [0, 1]. 

Definition 1.5 

A payoff is a function tt: A — >■ M that associates a numerical value with every 
action o G A. 
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Definition 1.6 

An action a* is an optimal action if 

7r(a*) > 7r(a) Va G A 



or, equivalently, 



a* G argmaxTr(a) . 

a^A 



( 1 . 1 ) 

( 1 . 2 ) 



That is, the optimal decision is to choose an a* G A that maximises the 
payoff 7r(a). In general, a* need not be a unique choice of action: if two actions 
lead to the same, maximal payoff, then either will do (notice the weak inequality 
in the first form of the definition). 

Example 1.7 

A jobseeker is offered two jobs, J\ and J 2 . Their possible actions are = accept 
Ji with t = 1,2. The payoffs are the salaries on offer: Ji pays £15000, J 2 pays 
£17000. Because 7r(ai) = 15000 and 71 ( 02 ) = 17000 the optimal decision is 
a* = 02 (i.e., accept the second job). 

Exercise 1.4 

An investor going to invest £1000 for a year and has narrowed the choice 
to one of two savings accounts. The two accounts differ only in the rate of 
return: the first pays 6% annually, and the second pays 3% at six month 
intervals. Which account should the investor choose? Does the answer 
depend on whether or not the initial capital is included in the payoff? 

In the previous example and exercise, the optimal decisions are not altered 
if payoffs are given in U.S. dollars (or any other currency) rather than pounds; 
nor are they altered if £1000 is added to each payoff. These alterations to the 
payoffs are both examples of affine transformations. 

Definition 1.8 

An affine transformation changes payoffs 7r(a) into payoffs 7r'(a) according to 
the rule 

7r'(a) = o;7r(a) + j3 

where a and /? are constants independent of a and a > 0. 
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Theorem 1.9 

The optimal action is unchanged if payoffs are altered by an affine transforma- 
tion. 



Proof 

Because a > 0 we have 

argmaxTr'(a) 

aSA 



argmax [o;7r(o) -I- /3] 

aSA 

argmax 7T (a) . 

aSA 



□ 



We now consider some problems in which the action set is a continuous 
subset of K. This can arise as a convenient approximation for models in which 
a discrete action set has a large number of elements. For example, if we are 
selling something, we might want to consider charging prices between £0.01 
and £5.00. Because we can only charge prices in whole pennies, the action set 
is discrete. But, rather than consider the consequences of 500 separate actions, 
we treat price as continuous and employ the powerful features of calculus to 
solve the problem. 



Example 1.10 



The Convent Fields Soup Company makes tomato soup. If it charges a price 
of p pounds per litre, then the market will buy Q{p) litres, where 



Q{p) 



Qo (l - if £ < Po 

0 if P > Po 



Q{p) is a non-increasing function of price p. So Qo is ^ constant that gives 
the maximum quantity that could be sold (at any price), and po is a constant 
that gives the maximum price that the market would be prepared to pay. The 
actions available to the company are the choice of a price p € [0,po]. There 
is no point in setting a price above po because the company would sell no 
soup. Suppose that the cost of producing soup is c pounds per litre. Taking 
the company’s profit as its payoff, we have 7r(p) = (p — c)Q(p). The optimal 
decision is to set a price p* that maximises the profit. To find this price, we 
find the maximum of the payoff as a function of price. Because 



dw 

dp 



(P*) 



Qo 





= 0 
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the optimal action is, therefore, to choose a price P* = \ {po + c). 



Example 1.11 



In the previous example, we assumed that the soup factory has already been 
constructed. The problem changes if the decision is being made before the 
factory is built. For simplicity, let us assume that the marginal cost of pro- 
duction is zero (the more general case is covered in Exercise 1.6) and suppose 
that it costs an amount B to build the factory. The payoff then appears to be 
tt{p) = pQ{p) — B. Because i? is a constant, the optimal price is p* = |po- 
However, if soup is sold at this price, the profit made by the company is 



tt{p*) 



QoPo _ 
4 



If B is large enough the company could make a loss by selling soup at this 
price. The optimal action is, therefore, to choose a price 



I I (po + c) if the profit will be positive 
i 0 otherwise. 



Exercise 1.5 

A company makes small widgets. If the manufacturer produces q widgets 
per day, they can be sold at a price P{q), where 

P{q) = Po max I (^1 - ) 0 

Assume the number of widgets produced is very large, so q can be treated 
as a continuous variable, (a) What quantity should be made to maximise 
the manufacturer’s income? (b) If manufacturing costs increase linearly 
with the number of widgets made (i.e., cost = cq), what quantity max- 
imises the manufacturer’s profit? 

Exercise 1.6 

A company is considering building a factory to make fertilizer. At a price 
p, T{p) tonnes will be sold, where 

T{p) = To max I ^1 - >0 

Suppose manufacturing costs increase with the tonnage made, t, as 
C{t) = Cq + c\t where cq and c\ are non-negative constants. What price 
would maximise the manufacturer’s income? Should the company build 
the factory? 
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So far, we have assumed that there is no uncertainty about the consequences 
of any decisions. If uncertainty exists, we can compare the expected outcome 
(in the probabilistic sense) for each action. Uncertainty about payoffs can be 
represented as a random variable, X, which takes certain values corresponding 
to possible “states of Nature” (e.g., economic conditions) with specified proba- 
bilities. We will denote the set of “states of Nature” by X and the probability 
with which a particular state x occurs will be denoted P{X = x). If the payoff 
associated with action a when the state of Nature is x is 7r(a|x), then the payoff 
for adopting action a is 

7r(a) = Tr{a\x)P{X = x) 

xGX 

and an optimal action is 

a* G argmax W 7r(a|a;)P(X = x) . 

xGX 



Example 1.12 

An investor has £1000 to invest for one year. Their^ available actions are 

oi: Put the money in a building society account that yields 7% interest p.a. 

02 : Invest in a share fund that gives a return of £1500 if the stock market 
performs well and £600 (i.e., a loss of £400) if the stock market performs 
badly. 

The state of Nature is the performance of the stock market, which is good 50% 
of the time and bad for the remaining 50% of the time. So we have the set 
of states X = {Good, Bad} with P{X = Good) = P{X = Bad) = 0.5. The 
expected payoffs (in pounds) for the two possible actions are 

7r(ai) = 1070 

7r(a2) = ^1500 -k ^600 = 1050. 

So, the optimal action is ai (put the money in the building society). 

^ As always, there is a problem in writing about individuals whose gender is irrel- 
evant: which pronoun to use? Rather than make an invidious choice or use the 
cumbersome “he or she” , I have opted to use “they” , “them” , and “their” . For 
example, “they should use the following strategy”. The use of these grammatically 
plural pronouns to refer to an individual is common in colloquial English and the 
mismatch between grammatical and actual number also occurs in other languages 
(for example, the polite use of “vous” in French). 
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Exercise 1.7 

Consider the following table of payoffs 7r(o|a;) for action set A = 
{01,02,03} and states of nature X = (xi, a;2, X3, X4}. 





Xi 


X2 


X3 


X4 


Oi 


3 


0 


3 


0 


02 


0 


3 


0 


3 


O3 


1 


1 


1 


1 



What are the optimal actions if 

(a) P{X = xi) = P{X = X2) = P{X = X3) = P{X = Xi)=\ 

(b) P{X = xi) = P{X = X3) = I and P{X = X2) = P{X = X4) = |? 



If A is a continuous random variable, then we use a density function f{x) 
with P{x < X < X + dx) = f{x)dx. Then the expected payoff for adopting 
action a is 



r(o) = / 

J X 



7 r(o|a:)/(x) dx 



kGX 



and an optimal action is 



G argmax 

oGA 



7 r(o|a;)/(x) dx . 



xex 



Example 1.13 

Suppose that an investor has a choice between two investments oi and 02 with 
payoffs 7r(oi|x) = w(l + r) and 7r(o2|a;) = w + X where A is a normally 
distributed random variable, A ~ A(/r, <t^). For example, oi could represent 
putting an initial capital w into a savings account with interest rate r and 02 
could represent investing the same amount in the stock market. The expected 
payoffs for the two actions are 7 r(ai) = w(l + r) and 

/ +00 

xf{x) dx 

-00 

= W + IJ, 



so the optimal action is 

, f oi ifwr > fr 

a = i .j, 

[ 02 if wr < fi 

If wr = IJ, then the investor is indifferent between oi and 02 (i.e., both oi and 
02 are optimal). 
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Exercise 1.8 



Each day a power company produces u units of power at a cost of c 
dollars per unit and sells them at a price p dollars per unit. Suppose 
that demand for power is exponentially distributed with mean d units, 



i.e., 



/W=lexp(-i) . 



If demand exceeds supply, then the company makes up the shortfall 
by buying units from another company at a cost of k dollars per unit 
{k > c). Show that the expected profit for the company (in dollars) is 



7 t ( m ) = pd — cu — kde 



and find the optimal level of production. 



1.3 Modelling Rational Behaviour 

Suppose a person is approached by a wealthy philanthropist who offers them 
a choice between getting £1 for certain and a 50% chance of getting £3 (and 
a 50% chance of nothing). Should the person choose the certain outcome or 
the gamble? What should they choose if the sums involved were £1 million 
and £3 million? Based on our procedure from the previous section, we might 
be tempted to say that the person “should” choose the gamble in each case 
because the expected amount of money received is higher than the amount of 
money received if the gamble is refused. However, most people will gamble with 
the low amounts but go for the certain million. Are they being inconsistent or 
irrational? 

Is maximising expected monetary value (EMV) what people should do? 
There are several reasons why they might not. First, people value things other 
than money: holidays, health, happiness, even the well-being of other people. 
Second, for most people money is only a means to an end so the “real” value 
of an amount of money need not be equal to its face value. Consider the case 
of the wealthy philanthropist who is offering the choice involving millions of 
pounds. Receiving £1 million will allow the recipient to retire and not have 
to worry about pensions or life insurance. Receiving £3 million is better than 
receiving £1 million but it isn’t 3 times as good because it is not possible 
to retire three times over. Third, the reaction to uncertainty may depend on 
personal circumstances. 
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Example 1.14 

When it sells some insurance, a company assesses the probabilities of various 
payouts. From this it calculates its expected loss L. To this loss it adds its 
profit P and charges C = L + P. Assuming that a customer agrees with the 
company about the expected loss L, why do they buy insurance (given that 
C > L )1 Part of the reason may be that, although they can afford the expected 
loss, they could not afford the actual loss if it occurred. So, maximising EMV is 
appropriate for one “individual” (the insurance company), but not for another 
(the customer). 

Apart from these considerations, there is another - more important - rea- 
son why we should not define rationality as maximising EMV: it switches the 
origin of a behaviour with its consequence. What we would like to do is define 
rationality in some way and then determine, as a consequence of this definition, 
whether we can derive any quantity that rational people will maximise. If we 
can do this, then we can use the procedures we have begun to develop with the 
quantity we have found taking the place of EMV in our calculations. So how 
can we define rationality? 

The first thing to note is that rationality should not be equated with dis- 
passionate reasoning (notwithstanding the view held by certain aliens from a 
popular science fiction series) . An individual’s desires lead to a ranking of out- 
comes in terms of preference. These preferences need not accord with those of 
another individual; however, they should be internally consistent, if they are 
to form a basis for choice. Thus we will define a rational person as one who 
has consistent preferences concerning outcomes and will attempt to achieve a 
preferred outcome. 

Suppose we have a set of possible outcomes (for example, eating a ham- 
burger or eating a salad). When asked, people will express preferences con- 
cerning these outcomes. These preferences are not necessarily the same for all 
people: some may prefer the salad while others prefer the hamburger. Given 
a free choice, people should choose their preferred outcome. (For the moment 
we will assume that there is no uncertainty about the consequence of a choice: 
choosing to act in a particular way definitely leads to the desired outcome.) 
Anyone who really prefers the hamburger but then chooses to eat a salad would 
be acting “irrationally” . Someone who says they prefer the hamburger but then 
chooses to eat a salad because they are on a diet has not expressed their true 
preferences because they have failed to include their desire to lose weight. 

Let the set of possible outcomes be denoted by 17 = {oji,oj2,oj3, . . We 
will write u)2 if an individual strictly prefers outcome wi over outcome u)2- 
We will write oji ~ 0J2 if an individual is indifferent between the two outcomes. 
Weak preference will be expressed by the operator The expression wi ^ uj2 
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means that an individual either prefers Wi to 0J2 or is indifferent between the 
two outcomes. 

Definition 1.15 

An individual will be called rational under certainty if their preferences for 
outcomes satisfy the following conditions: 

1 . (Completeness) Either u>i ^ W2 or u>2 ^ wi. 

2 . (Transitivity) If uji ^ u>2 and 102 ^ W3 then wi ^ W3. 

The completeness condition ensures that all outcomes can be compared 
with each other. The transitivity condition implies that outcomes can be listed 
in order of preference (possibly with ties between some outcomes). Together 
these conditions imply that we can introduce the idea of a utility function. 
An individual will be assumed to have a personal utility function u{ui) that 
gives their utility for any outcome, ui. The outcome oj may be numeric (e.g., an 
amount of money or a number of days of holiday) or less tangible (e.g., degree 
of happiness) . Whatever the reward is, the utility function assigns a number to 
that reward and encapsulates everything about an outcome that is important 
to the particular individual being considered. 



Definition 1.16 

A utility function is a function u: — >■ K such that: 

u{uji) > u{uj2) OJ 2 

u{uJi) = U{UJ2) ~ W2 

An immediate consequence of this definition is that an individual who is 
rational under certainty should seek to maximise their utility. The relation 
between the utility function u and the payoff function tt is straightforward. 
Suppose choosing action a produces outcome uj(a) then 7 r(a) = u(co(a)). 

Now let us consider what happens when an action does not produce a 
definite outcome and instead we allow each outcome to occur with a known 
probability. Such uncertain outcomes will be called “lotteries”. 



Definition 1.17 

A simple lottery, A, is a set of probabilities for the occurrence of every w G Jl. 
We shall denote the probability that outcome w occurs in lottery A by p(w|A). 
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The set of all possible lotteries will be denoted A. (Although the set of lotteries 
depends on the basic set of outcomes we will not make this dependence 
explicit.) 



Definition 1.18 

A compound lottery is a linear combination of simple lotteries (from the same 
set A). For example, q\\ + (1 — g)A2 with 0 < g < 1 is a compound lottery. 

A compound lottery can be regarded as a lottery in which the outcomes are 
themselves lotteries. The example lottery given in the definition could be taken 
to mean that simple lottery Ai occurs with probability q and simple lottery 
A2 occurs with probability 1 — g. Compound lotteries are not really different 
from simple lotteries: the compound lottery gAi + (1 — g)A2 is equivalent to 
a simple lottery A with probabilities p{iv\X), which can be determined from 
the probabilities p{uj\Xi), p{to\X2) and the parameter g. However, the ability to 
define lotteries as combinations of other lotteries is useful for the definition of 
rationality. 



Definition 1.19 

An individual will be called rational under uncertainty or just rational if their 
preferences for lotteries satisfy the following conditions: 

1 . (Completeness) Either Ai ^ A2 or A2 ^ Ai. 

2 . (Transitivity) If Ai ^ A2 and A2 ^ A3 then Ai ^ A3. 

3 . (Monotonicity) If Ai A2 and gi > g2 then giAi + (I — gi)A2 g2Ai + 
(I — q 2 )X 2 - 

4 . (Continuity) If Ai ^ A2 and A2 ^ A3 then there exists a probability g such 
that X2 ^ gAi + (1 - q)X^. 

5 . (Independence) If Ai A2 then gAi + (1 — q)Xz >- qX2 + (1 — g)A3 

As above, the completeness condition ensures that all lotteries can be com- 
pared with each other and the transitivity condition implies that lotteries can 
be listed in order of preference (possibly with ties). The monotonicity and con- 
tinuity conditions assert that a lottery gets better smoothly as the probability 
of a preferred outcome increases. The independence condition implies that pref- 
erences only depend on the differences between lotteries; components that are 
the same can be ignored. 
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Suppose that choosing action a produces lottery A(a). What is the payoff 
7r(a) that a rational individual will seek to maximise? We might try to introduce 
a utility function for the expected outcome E(w). There are two problems with 
this. First, it is not clear what E(w) would mean for outcomes that are not given 
numerically. Second, even when the outcomes are given numerically, it seems 
that people do not necessarily maximise any function of the expected outcome. 
Consider the example of wealthy philanthropist again. If an individual takes the 
gamble for the £3 million, then the expected outcome is £1.5 million. Because 
people prefer the certain million to the gamble, it would seem - if they are 
maximising some sort of “utility of the expected outcome” - that the utility 
of £1 million is greater than the utility of £1.5 million, which seems highly 
unlikely. An alternative to maximising the “utility” of the expected outcome is 
maximising the expected utility. 



Theorem 1.20 (Expected Utility Theorem) 

If an individual is rational in the sense of Definition 1.19, then we can define a 
utility function u: — >■ K and rational individuals act in a way that maximises 

the payoff function 7r(a) (the expected utility) given by 

= X! P(t^|A(a))M(u;) . (1.3) 



Proof 

A more detailed discussion and proof of the Expected Utility Theorem is given 
by Myerson (1991). □ 

Remark 1.21 

The conditions for rationality expressed in Definition 1.19 only determine the 
utility function up to an affine transformation (see Definition 1.8). However, 
this does not present a problem, because the optimality of any behaviour is not 
altered by a change of this type (see Theorem 1.9). 

The explicit construction of a utility function, which is important for con- 
structing realistic models of a person’s behaviour (and is a problem that must 
be solved for each model) will be ignored in this book. We will either assume 
that maximising EMV is appropriate or specify a utility function. We will 
also consider completely abstract situations and, in these cases, an individual’s 
payoff will be tacitly specified in “units of utility” without worrying about the 
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Figure 1.1 The utility function for the individual considered in Exercise 1.9. 
The utility u{w) of wealth w is such that E(u(w)) < u(E(w)), so this individual 
is risk averse. 

various components of an outcome that actually determine this value. 

Exercise 1.9 

Consider an individual whose utility function of wealth, w, is given by 
u{w) = 1 — exp(— /cm) with k > 0. Assuming that wealth increments are 
Normally distributed, show that an individual’s expected utility can be 
represented as a trade-off between mean and variance, as in Equation 
(1.4). 

Definition 1.22 

An individual whose utility function satisfies E(u(w)) < ufEfw)) is said (assum- 
ing E(u;) can be defined) to be risk averse. If E(u(o;)) > u(Ew) the individual 
is said to be risk prone. If E(u(u;)) = u(Ew) the individual is said to be risk 
neutral. 

Example 1.23 

The individual considered in Exercise 1.9 is risk averse. (See Figure 1.1.) 



Example 1.24 

Consider the following classical portfolio choice problem. Two assets are avail- 
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able to an investor. One is riskless (e.g., a bank account) providing a fixed return 
of r on the initial sum; the other is risky (e.g., stock market) with a return, 
having a mean /x and a standard deviation cr. If the investor is a straightforward 
EMV-maximiser, then they should invest all of their money in stocks if /x > r. 
However, in some circumstances, a risk-averse investor may prefer to trade-off 
the expected return and its variance in a linear fashion (see Exercise 1.9). In 
other words, they can reduce the variability of their return by constructing a 
portfolio in which they place some fraction of their money in the bank and 
invest the remainder in the stock market. If a is the fraction that they place 
in stocks, then the expected return on the portfolio is a/x -|- (1 — a)r and its 
variance is So the investor’s expected utility is 

7r(a) = a/x -I- (1 — a)r — (1-4) 

where k represents the value that the investor places on the variance relative 
to the expectation. This expected utility is maximised for 

{ 0 if /X < r 

^ if 0 < /X — r < . 

1 fi — r > ka'^ 

(Check that the second derivative is negative; or calculate 7r(a*), 7r(0) and 
7t(1).) 



Exercise 1.10 

Consider an individual whose utility function of wealth, w, is quadratic: 
u{w) = w—kuP' , where the constant k is such that u{w) is non-decreasing 
over the allowed range for w. Repeat the portfolio problem from Exam- 
ple 1.24. 



1.4 Modelling Natural Selection 

In this section, we will consider the - at first, rather surprising - proposition 
that the mathematics describing optimal decisions by rational individuals can 
also be applied to the behaviour of animals. 

The assertion of optimal behaviour by animals rests on the following in- 
terpretation of Natural Selection. In the past, a population of animals from a 
single species contained several types of individual that were genetically pro- 
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grammed to use one of a variety of behaviours.^ Some of these behaviours result 
in the animals having few descendants, other behaviours result in animals hav- 
ing many descendants. Through their genes, parents pass on their programmed 
behaviour to their offspring and after many generations, the type of animal 
that leaves the greatest number of descendants will be numerically dominant 
in the population. 

Example 1.25 

Consider a population consisting of two types of individual, labelled i = 1,2. 
The animals live for a year, breed once and then die. Individuals of type i have 
ri offspring, where we assume (without loss of generality) that ri > r 2 . Suppose 
that at time t there are rii(t) animals of type i, then at time t + 1 (i.e., the 
following year) there will be rii(t+ 1) = ririiit) of each type. Starting from time 
t = 0 when there are rii(O) animals of each type, there are 

rii{l) = ririi{0) at time t = 1; 

rii{2) = rim{l) = rfrii(O) at time t = 2; 

rii{3) = rim{2) = rfni(l) = rfni(O) at time t = 3; 

rii{t) = ririi{t — 1) = • • • = r|ni(0) at time t. 

So the ratio of the numbers of the two types at time t is 

n2{t) _ / n2(0) 

\rij ni(0) 

This ratio tends to zero as t — 1 oo. In other words, the population comes to 
be dominated by animals of type 1. We can paraphrase the action of Natural 
Selection by saying that the animals should “choose” the behaviour that gives 
the reproduction rate ri. 

Exercise 1.11 

Duck-billed platypuses lay n eggs, where n is a characteristic that varies 
between individuals and is inherited by a platypus’s offspring. The prob- 
ability that each egg hatches is H{n) = 1 — krE where k = 0.1. After 
many generations of Natural Selection how many eggs will platypuses be 
laying? (Remember that n is an integer.) 

® Actually the relationship between genetics and behaviour may be quite compli- 
cated and is, in general, poorly understood. The procedure of treating particular 
behaviours as heritable units is called the phenotypic gambit and has proved to be 
a useful starting point. 





1.4 Modelling Natural Selection 



19 



Definition 1.26 



The fitness of a behaviour is defined to be the asymptotic growth rate of the 
sub-population of animals using that behaviour. That is, for animals with be- 
havioural type i we can define an annual growth rate as 



n{t) 



nj{t + 1 ) 

n,{t) 



and the fitness for this type is given by 7r(i) = limj_>oo 



It is, therefore, a matter of definition that Natural Selection acts in such a 
way as to maximise fitness. If we want to explain (in evolutionary terms) the 
behaviour of animals, we consider a set of plausible alternative behaviours and 
find the one that maximises fitness: this is the behaviour that animals should be 
using (provided Natural Selection has had enough time to act). When we use the 
criterion that an animal should behave in a way that maximizes its fitness, we 
don’t imagine that an individual animal is performing complex calculations in 
order to do this. The language of choice and optimisation is used as a convenient 
short-hand for the action of Natural Selection. 

In Example 1.25, the fitnesses for the two behaviours were just the respective 
reproduction rates Vi. The next example shows that it is not always appropriate 
to use the number of offspring produced during an animal’s life as a measure 
of fitness. 



Example 1.27 

Suppose an animal has two possible behaviours: 

oi: Produce 8 offspring, then die. (“Live fast and die young”.) 

02 : Produce 5 offspring in the first year, produce 6 more offspring in a second 
year and then die. (“Live slowly and die old”.) 

The fitness for behaviour oi is simply 7r(oi) = 8. Determining the fitness for 
behaviour 02 is a bit more involved. In this case, the sub-population at time t 
consists of f(t) first-year breeders and s(t) second-year breeders. These numbers 
change from year to year according to 

f(t+l) = 5f(t) + 6s(t) 
s(t+l) = f(t) . 

Adding these two equations gives us 7r(o2) = 6. The animal should, therefore, 
choose ai- That is. Natural Selection should produce a population of animals 
which “live fast and die young” . This tells us that we should avoid a common 
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misinterpretation of the phrase “survival of the fittest” : it is genetic lines, not 
individuals, which must survive. 

In practice, even the expected number of offspring may not be calculated 
explicitly. Instead some factor that affects fitness is considered, if it can be 
reasonably assumed that this is the only component of fitness that is affected 
by the variety of behaviours being considered. 



Example 1.28 

Suppose an animal chooses actions from a set A = { 01 , 02 } and the animal’s 
probability of survival to the breeding season is Si if it chooses action oi . If the 
animal survives to breed, it has n offspring. The payoff/fitness for adopting o^ 
is nSi- Because the factor n is common to the payoffs for all actions, we may 
consider only the survival probabilities: 7r(oi) = Si. So a* = oi if Si > S 2 and 
a* = 02 if Si < S' 2 . 



Exercise 1.12 

Before migrating to its breeding site, a bird must try to build up its 
energy reserves x. The bird can choose to forage in any one of three 
sites, i = {1,2,3}. On each site the bird has a probability Xi of being 
eaten by a predator and at the end of the pre-migration foraging period 
(if it has not been eaten) a bird’s reserves will be either high or low 
with certain probabilities. The parameters for each site are given in the 
following table. 



Site 


1 2 3 


A 

P{x = high) 


0.2 0.1 0.05 

0.8 0.6 0.4 



The probability of surviving migration is = 0.9 if reserves are high 
and Ml = 0.5 if reserves are low. If a bird survives, it produces a fixed 
number of offspring during the next breeding season. Which site should 
the bird choose?"* 

Remember, this is just a shorthand way of asking “what is the result of natural 
selection acting for many years on a population of birds?” 
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1.5 Optimal Behaviour 

Up till now, we have considered the problem of finding an optimal action a* 
from a given set A. However, another type of behaviour may be available to 
an individual: they may randomise. Does this allow an individual to achieve a 
higher payoff than if they stick to picking an action? 

Definition 1.29 

We specify a general behaviour (3 by giving the list of probabilities with which 
each available action is chosen. We denote the probability that action a is 
chosen by p(a) and 

^p(a) = 1 . 

aSA 

The set of all possible randomising behaviours (for a given problem) will be 
denoted by B. 

The payoff for using a behaviour j3 is related to the payoffs for the actions 
in the obvious way. The payoff for using (3 is given by 

= ^p{a)T:{a) . (1.5) 

oSA 

In an uncertain world, we can also define the payoffs 

p{a)T^{a\x) 

ae A 

SO that 

^ ■ (1-6) 

Exercise 1.13 

Show that the payoff for a behaviour (3 is the same whether we define it 
via Equation 1.5 or Equation 1.6. 



Definition 1.30 

An optimal behaviour j3* is one for which 

7t(/3*) > 7t(/3) V/3 G B (1.7) 

or, if we focus on behaviours rather than payoffs, 

(3* G argmax7r(/3) . 

/3GB 



( 1 . 8 ) 
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Definition 1.31 

The support of a behaviour /3 is the set A(/3) C A of all the actions for which 
P specifies p{a) > 0. 

Theorem 1.32 

Let P* be an optimal behaviour with support A*. Then 7r(a) = tt{P*) Vo G A*. 

Proof 

If the set A* contains only one action, then the theorem is trivially true. Suppose 
now that the set A* contains more than one action. If the theorem is not true, 
then at least one action gives a higher payoff than tt{P*). Let a' the action 
which gives the greatest such payoff. Then 

aG A* 

= X! P*(“)^(“) + P*(o')7r(a') 

a^a' 

< '^P*{a)TT{a') +p*{a')Tr{a') 

a^a' 

= Tr(a') 

which contradicts the original assumption that P* is optimal. □ 

A consequence of this theorem is that if a randomising behaviour is optimal, 
then two or more actions are optimal as well. So, randomisation is not necessary 
to achieve an maximal payoff. However, it may be used as a tie-breaker for 
choosing between two or more equally acceptable actions. 

Exercise 1.14 

A firm may make one of three marketing decisions {oi , 02, 03}. The profit 
(in millions of pounds) for each decision depends on the state of the 
economy X = {a;i,X2,a;3} as given in the table below. 



Xi 


X2 


X3 


a\ 6 


5 


3 


02 3 


5 


4 


03 5 


9 


1 



If P{X = x\) = I and P{X = X2) = P{X = X3) = |, find all optimal 
behaviours. 






2 

Simple Decision Processes 



2.1 Decision Trees 

A man hears that his young daughter always takes a nickel when an adult 
relative offers her a choice between a nickel and a dime. He explains to his 
daughter, “A dime is twice as valuable as a nickel, so you should always choose 
the dime” . In a rather exasperated tone, his daughter replies “Daddy, but then 
people will not offer me any money” . 

This story is an example of a decision process: a sequence of decisions is 
made, although the process may terminate before all potential decisions have 
been taken. The story also illustrates two components of what is considered to 
be strategic behaviour. First, immediate rewards are forgone in the expectation 
of a payback in the future. Second, the behaviour of others is taken into account. 
It is the former component that is the main subject of this chapter. While the 
second component may be present in some of the situations we will look at, the 
behaviour of all individuals other than the one being considered will be taken 
as fixed. In Part II, we will allow all players to change their behaviour at will. 

To represent the problems like the nickel and dime game pictorially, we 
can draw a decision tree. The times at which decisions are made are shown 
as small, filled circles. Leading away from these decision nodes is a branch for 
every action that could be taken at that node. When every decision has been 
made, one reaches the end of one path through the tree. At that point, the 
payoff for following that path is written. The convention we will follow is for 
time to increase as one goes down the page, so the tree is drawn “upside-down” . 
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Figure 2.1 Choosing a nickel (iV) or a dime (D) on (at most) two occasions. 
The payoff in cents is given at the end of each branch of the tree. 



Example 2.1 

Suppose that the adult will offer the “nickel or dime” choice at most twice: 
if the girl takes the dime on the first occasion, then the choice will be offered 
only once. The nickel and dime problem can then be represented by the tree 
shown in Figure 2.1. If she chooses a dime (action D) at the first opportunity, 
then she receives ten cents and no further offer is made. On the other hand, 
if she chooses the nickel (action N), she gets five cents and a second choice. 
It is clear what the girl should do. If she chooses the nickel the first time and 
then the dime, she gets a payoff of fifteen cents; if she follows any other course 
of action, she gets only ten cents. Therefore, she should choose the nickel first 
and then the dime. 



2.2 Strategic Behaviour 

The word “strategy” is derived from the Greek word strategos {arparejog) 
meaning “military commander” and, colloquially, a strategy is a plan of action. 



Definition 2.2 

A strategy is a rule for choosing an action at every point that a decision might 
have to be made. A pure strategy is one in which there is no randomisation. 
The set of all possible pure strategies will be denoted S. 
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Figure 2.2 Two decision trees that could have the pure strategy set given in 
Example 2.3. 

Suppose that there are n decision nodes and that at each decision node i 
there is an action set describing the choices that can be made at that point. 
Some or all of the sets may be identical. Then the set of pure strategies S 
is given by the cross-product of all the action sets: S = Ai x A2 x • • • x A„. 



Example 2.3 

Suppose there are three decision nodes at which the action sets are A = 
{01,02}, B = {61, 62} and C = {ci,C2|. Then the set of pure strategies is 
given by the set of eight triples 

S = {0161C1, 0161C2, 01&2C1, 01&2C2, 02&1C1, 02&1C2, 02&2C1, 02&2C2} • 

This strategy set could apply to either of the decision trees illustrated in Figure 

2 . 2 . 



Definition 2.4 

The observed behaviour of an individual following a given strategy is called the 
outcome of the strategy. 
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The definition of a strategy leads to some redundancy in terms of outcomes. 
A pure strategy picks a path through the decision tree from the initial point to 
one of the terminal points. However, a pure strategy is not just a path through 
the decision tree: a pure strategy specifies the action that would be taken at 
every decision node including those that will not be reached if the strategy is 
followed. In other words, the observed behaviour of an individual only provides 
us with part of the strategy itself. 

Example 2.5 

Consider the “nickel or dime” game shown in Figure 2.1. The pure strategy set 
is S = {NN, ND, DN, DD}, where each pair of actions represents the choices 
made in the natural (time-increasing) order. Two of these strategies, DN and 
DD, yield the same outcome because choosing the dime at the first decision 
node means that no further decisions have to be made. 

Because strategies that give the same outcome lead to the same payoff, it is 
sometimes useful to introduce the concept of a “reduced” set of pure strategies, 
which removes this redundancy from the discussion. 



Definition 2.6 

A reduced strategy set is the set formed when all pure strategies that lead to 
indistinguishable outcomes are combined. 



Example 2.7 

For the “nickel or dime” game shown in Figure 2.1. The reduced strategy set 
is Sfl = {A^iV, ND, DX}, where the combination DX means “dime at the first 
decision node and anything at the other decision node” . 

Exercise 2.1 

(a) Consider a variant of the “nickel or dime” game from Example 2.1 
where the child is offered nickels or dimes on three occasions at most. 
Draw the tree for this decision problem, determine the pure-strategy 
set and find the optimal strategy? 

(b) Suppose the child is offered the nickel or dime choice on n occasions. 
What is the optimal strategy? 

(c) Suppose the adult offers the child a choice between a nickel or a 
dime. If the child takes the dime, then the game stops. If the child 
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takes the nickel, then the choice is offered again with probability p. 
If p < 1, then the game will eventually terminate, perhaps because 
the adult gets bored. What is the child’s optimal strategy? 



2.3 Randomising Strategies 

When there is only a single decision to be made, the sets of actions and pure 
strategies are identical. There is also only one way of specifying randomising 
behaviour. 



Example 2.8 

Suppose the action (or pure strategy) set is {ai, 02 }. A general behaviour spec- 
ifies using oi with probability p and 02 with probability 1 — p. In Section 1.5, 
we denoted this by /? = (p, I — p) . 

When there is (potentially) more than one decision to be made, the ac- 
tion sets and pure strategy sets are no longer identical and there are now two, 
conceptually different ways of representing a randomizing behaviour. To dis- 
tinguish between them we shall call one a “mixed strategy” and the other a 
“behavioural strategy” . 

Definition 2.9 

A mixed strategy a specifies the probability p(s) with which each of the pure 
strategies s G S is used. 

Suppose the set of strategies is S = {sa, S{,, Sc, . . .}, then a mixed strategy 
can be represented as a vector of probabilities: 

(p(Sa),p(sb),p(Sc),---) • 

A pure strategy can then be represented as a vector where all the entries are 
zero except one. For example. 



Sfc = (0,1,0,...) . 

Mixed strategies can, therefore, be represented as linear combinations of pure 
strategies: 

^ ■ 
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Remark 2.10 

Often these linear combinations are written symbolically. For example, in the 
“nickel or dime” game, the mixed strategy in which NN is used with probability 
j and DN is used with probability | might be written as 

cr = -NN+-DN . 

4 4 



Definition 2.11 

The support of a mixed strategy a is the set S((t) C S of all the pure strategies 
for which cr specifies p{s) > 0. 

For a mixed strategy, the randomisation takes place once before the decision 
tree is traversed: once a strategy has been chosen, the path through the tree is 
fixed. 



Definition 2.12 

Let the decision nodes be labelled by an indicator set I = {1,2,3, .. . n}. At 
node i, the action set is = {a{, a^, . . . , a{,. } (where we have allowed the 
number of available actions ki to be different at each decision node i). An 
individual’s behaviour at node i is determined by a probability vector pi where 
Pi = {p{a\),p{al), . . . and p{aj) is the probability with which he selects 

action a* € Ai (if, in fact, they reach decision node i). A behavioural strategy 
(3 is the collection of probability vectors 



/3 = {Pl,P2,---,Pn} • (2.1) 

In contrast to a mixed strategy, a behavioural strategy causes randomisation 
to take place several times as the decision tree is traversed. 

As we shall see, these two representations of randomising behaviour are 
interchangeable in the sense that every mixed strategy has an equivalent be- 
havioural representation and every behavioural strategy has an equivalent 
mixed representation. In each case, a strategy defined in terms of one type 
of representation may have more than one equivalent strategy defined in terms 
of the other representation. Before we give a proper definition, let’s look at the 
idea of equivalence by means of an example that illustrates two strategies that 
are not equivalent. 
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Example 2.13 

Consider the “nickel or dime” game shown in figure 2.1. One mixed strategy for 
this decision process is cr = ^NN+ \DD. It would be tempting to believe that 
a behavioural equivalent to this strategy is/3 = but this would 

be incorrect. To see why, note that there are three paths through the decision 
tree. Let’s call them “dime only”, “all nickels” and “nickel then dime”. The 
mixed strategy a picks out the paths “dime only” and “all nickels” each with 
a probability | and picks “nickel then dime” with probability zero. However, 
the behavioural strategy (3 specifies choosing the action D at the later decision 
node with probability Therefore, the path “nickel then dime” would be 
picked with probability | and not zero. The strategies a and [3 are, therefore, 
not equivalent. 

Definition 2.14 

A behavioural strategy and a mixed strategy are equivalent if they assign the 
same probabilities to each of the possible pure strategies that are available. 

It follows immediately that equivalent mixed and behavioural strategies 
have the equal payoffs. 



Example 2.15 

A behavioural strategy which is equivalent to the mixed strategy a in exam- 
ple 2.13 is /? = ((^, , (1,0)). Furthermore, any of the mixed strategies 



1 , , 


\ 




r 


Gx = 3- 


--x] DD + xDN 

k2 y 


with 


[O,- 



is equivalent to the behavioural strategy (3 = ((5j5)j(1)0)). 



Exercise 2.2 

Show that the following behavioural and mixed strategies for the “nickel 
or dime” game of Example 2.1 all have the same payoff. 




a 



DD + xDN with x € 0, - 
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Theorem 2.16 

(a) Every behavioural strategy has a mixed representation and (b) every mixed 
strategy has a behavioural representation. 



Proof 



(a) An individual using a pure strategy s G S will pass through a set of decision 
nodes I(s) C I, choosing some action o*(s) G for each i G I(s). At each 
decision node t G I, a given behavioural strategy f3 would prescribe choosing 
that action with probability p(a*(s)). So the probability that an individual 
using j3 would traverse the decision tree via the decision nodes I(s) is 

p{s) = p(a*(s)) • 

iei{s) 

A mixed strategy representation of (3 is then 
because 

n 1 • 

sCS iel(s) 

(b) Let a = he some mixed strategy. For each pure strategy 

s, let I(s) C I be the set of decision nodes an individual encounters when he 
follows strategy s. For each decision node i G I, let S{i) C S be the set of pure 
strategies that reach decision node i. Then the probability that an individual 
following the mixed strategy a will reach decision node i is 

pAi) = pA) ■ 

sCS(i) 

Let S(a*,i) C S{i) be the set of pure strategies that reach decision node i and 
choose action a* G at that point. Then the probability that an individual 
following the mixed strategy cr will reach decision node i and choose a* is 

Pa{a\i)= pA) ■ 

sGS(a*,2) 



Provided Pa-{i) yf 0 we can define the probability of choosing a* at i as 



p{a^) 



Pcr{i) 



If Pcr{i) = 0 for some decision node i then any set of probabilities p(a*) with 
X)a*GA suffice. Clearly X)a*GA = 1 Vi G I, so the collection 
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Figure 2.3 Decision tree for Exercise 2.3. 

of these probabilities for all actions at all decision nodes forms a behavioural 
representation of the mixed strategy cr. □ 

It follows from this theorem that we are free to choose the representation 
for strategies that best suits the problem in hand. 

Exercise 2.3 

Consider the decision tree shown in Figure 2.3. Find the all behavioural 
strategy equivalents for the mixed strategies (a) a = |ai6iCi + ^026202 
and (b) a = |oi6iCi + |ai62Ci + |ai6iC2. 



2.4 Optimal Strategies 

In Chapter 1, we saw that randomising behaviour was not required for single 
decisions, in the sense that an optimal action could always be found. A similar 
result holds for decision processes. 

Lemma 2.17 

Let a* be an optimal mixed strategy with support S*. Then 7r(s) = 7t(ct*) 
Vs G S*. 
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Proof 

If the set S* contains only one strategy, then the theorem is trivially true. 
Suppose now that the set S* contains more than one strategy. If the theorem 
is not true, then at least one strategy gives a higher payoff than 7t(ct*). Let s' 
be the strategy that gives the greatest such payoff. Then 

= ^P*(’S)7t(s) +P*(s')7t(s') 

s^s' 

< '^P*{s)tt{s')+p*{s')tt{s') 

s^s' 

= 7t(s') 

which contradicts the original assumption that a* is optimal. □ 



Theorem 2.18 

For any decision process, an optimal pure strategy can always be found. 



Proof 

From Theorem 2.16, we know that every behavioural strategy has at least one 
equivalent mixed strategy. It follows that no behavioural strategy can have a 
payoff greater than that which could be achieved by using a mixed strategy. It, 
therefore, follows from the preceding lemma that, if an optimal mixed strategy 
exists, then an optimal pure strategy also exists. □ 

So far, we have adopted a simple procedure for finding a optimal strategy: 
list the possible pure strategies, calculate the payoff for each of these, and pick 
one that gives the optimal payoff. However, the burden of the procedure in- 
creases exponentially as the decision tree becomes larger. A tree with n decision 
nodes each with 2 possible actions leads to 2” pure strategies. Fortunately, we 
can reduce this burden by employing the Principle of Optimality. This principle 
states that from any point on an optimal path, the remaining path is optimal 
for the decision problem that starts at that point. In other words, to find the 
optimal decision now, we should assume that we will behave optimally in the 
future. 
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Definition 2.19 

A partial history h is the sequence of decisions that have been made by an 
individual up to some specified time. At the start of a decision process (when 
no decisions have been made), we have the null history, ft. = 0. A full history 
for a strategy s is the complete sequence of all decisions that would be made 
by an individual following s and will be denoted H (s) . 



Remark 2.20 

If an individual has perfect recall (i.e., remembers all their past decisions), then 
each decision node has a unique history and each history specifies a unique 
(current) decision node. 

Define the subset of pure strategies S'(ft) G S that contains all the strategies 
with history ft but that differ in the actions taken in the future. Then the 
optimal payoff an individual can achieve given that they have history ft is 

7T*(s|ft) = max 7t(s). 
s&Sih) 

Assume that the individual now has a choice from a set of actions A{h). After 
that decision has been made, the history will be the sequence ft with the chosen 
action a appended. We will write this as h, a. 



Theorem 2.21 (The Optimality Principle) 

For an individual with perfect recall: 

1. 7 T*(s|ift(s)) = 7t(s) 

2. 7 T*(s|ft) = maxag A(/i) 7 '‘*(s|ft, a) 

3. 7 T* = max 5 gs( 0 ) 7 T*(s| 0 ) 



Proof 

1. By the definition of H{s), the individual has no more decisions to make 
and the best payoff they can get is the payoff they have already achieved 
by following the strategy s. 

2. A pure strategy is a sequence of actions {oq, Oi, . . . , ah, ah+i, a/i+ 2 , ■ • ■ , Off} 



so 



7t(s) "^(oq , a\, . . . , Oh, ah-\-l, • ■ • j O// ) . 
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Let the partial history ft, be a given sequence {oq, oi, . . . , a?i} then 
7T*(s|ft) = maxmax . . . max7r(ao, aij • ■ • , Oft,, o/i_|_ 2 , . . . , a//) 

o-h + l O-h+2 O-H 

= max7r’*'(s|ft, 

ah+l 

3. The history ft = 0 denotes optimisation problem starting from the very 
beginning. So S'(0) = S and 

max 7r*(s|0) = max7r(s) 

sgS(0) ses ^ ' 



□ 

The Optimality Principle leads directly to a convenient method for solving 
dynamic decision problems. If we wish to find the optimal decision now by 
assuming that we will behave optimally in the future, it makes sense to sort 
out the future decisions first. In other words, we should work backwards through 
the decision tree - a procedure known as backward induction. 

Example 2.22 

Consider the decision tree shown in Figure 2.4. To work backwards through 
this tree, we start at either decision node 2 or decision node 3. It does not 
matter which: all that matters is that no decision node is considered before all 
the decision nodes that follow on from it have been dealt with. At decision node 
3 we would choose C (rather than D), which gives us a payoff of 8. At decision 
node 2 we would choose D to get a payoff of 7. Now consider decision node 
1. Assuming that we will choose optimally in the future whatever we do now, 
choosing A leads to a final payoff of 7 whereas choosing B leads to a payoff of 
8. The optimal strategy is therefore BDC (in the order of the labelling of the 
decision nodes). 

The previous example shows, at least, that backward induction produces 
the same result as a complete strategic analysis. However, the advantages of the 
approach seem fairly minimal. The real power of backward induction reveals 
itself when we consider problems for which drawing a complete decision tree 
is impractical if not actually impossible. Such problems are considered in the 
next chapter. 

Exercise 2.4 

Consider a female bird choosing a mate from three displaying males. The 
attributes of the males are summarised by the following table. 





2.4 Optimal Strategies 



35 




Figure 2.4 Decision tree for Example 2.22. 



Male 


Genetic quality 


Cares for chicks? 


1 


High 


No 


2 


Medium 


Yes 


3 


Low 


Yes 



Suppose that the value of offspring depends on the genetic quality of 
the father. The value of offspring is VH^ vm, and vl for the males of 
high, medium, and low quality, respectively, with vh > vm > vl- Once 
she has mated, the female can choose to care for the chicks or desert 
them. Chicks that are cared for by both parents will certainly survive; 
those cared for by only one parent (of either sex) have a 50% chance of 
survival; and those deserted by both parents will certainly die. Draw the 
decision tree and find the female’s optimal strategy. 






3 

Markov Decision Processes 



3.1 State-dependent Decision Processes 

In this chapter, we add an extra layer of complexity to our models of decision 
making by introducing the idea of a state-dependent decision process. The 
processes we will consider can either be deterministic or stochastic. To begin 
with, we will assume that the process must terminate by an a priori fixed 
time T (a “finite horizon” model). In principle, decisions can be made at times 
t = 0, 1, 2, . . . , T — 1, although the actual number of decisions made may be 
fewer than T if the process terminates early as a consequence of the actions 
taken. Models that have no a priori restriction on the number of decisions to 
be taken (“infinite horizon” models) will be considered in the next chapter. 

We consider individuals to have some state variable x taken from a set 
X that may be either discrete or continuous. This state could represent an 
individual’s wealth, number of offspring, need for food, etc. We allow individuals 
to condition their behaviour on the state they find themselves in at any given 
time, and we will denote the set of actions available in state x at time t by 
A{x,t). The action taken causes a transition to a new state: that is, at time 
t, the action at induces a transition xt — >■ Xt+i- Thus we are considering a 
deterministic process that consists of the sequence of pairs (xt,at) for t = 
0 , 1 , 2 ,.... 
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Definition 3.1 

An alternating sequence of states and actions xq, ao, xi,ai,X2 , ... is called the 
history of a process. We will denote by ht = (xq, uq, xi, ai, . . . , Xt) the history 
of the process up to time t. 

How are decisions to be made? In its simplest form, a pure strategy s could 
just be taken as a sequence of actions oq, oi , 02, . . .. More generally, we can allow 
strategies to specify a choice of action a(x, t) for each state and each decision 
time. Starting from xq, this strategy generates a history xq, ag, xi, ai, X2, ■ ■ ■ 
where we have written at = a(xt, t). We will assume that this history generates 
a sequence of rewards rt{xt, at) for t = 1,2, . . . ,T — 1. That is, an individual 
in state Xt at time t who uses action at will receive an immediate reward 
of rt{xt,at). For finite processes, we also include an optional terminal reward 
rrixr) that is received at the end of the process. The total reward obtained 
starting in state x if an individual follows a strategy s is given by 

T-l 

7 t ( x | s ) = ^ rt{xt, at) + rrixT)- 

In some problems, the process may start in a known state xg- In which case, 
we only have to consider one payoff, namely 7r(a;o|s). 

We have so far assumed that the state transition caused by choosing action a 
is deterministic. We will now consider stochastic decision processes in which the 
state at time t is a random variables, which we denote by Xt- The probabilities 
for the state transition xt — >■ Xt+i can, in general, depend on the whole history 
of the process as well as the action chosen at 



P{Xt+i = xt+i) = p{xt+i\ht,at) 



with 



^ p{xt+i\ht,at) = 1 . 



for all times, histories and actions. We should also consider randomising strate- 
gies a such that the action chosen is also a random variable At . The total reward 
obtained starting in state x if an individual follows a strategy a is given by 



'k{x\u) 



[T-l 



E 



At) + rxiXr) 



T-l 

y]Eh(Xi,A)]+E[rT(XT)]- 
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We will use the notation 7t*(x) for the payoff obtained by following the optimal 
strategy starting from state x. That is, 

7T*(x) = 7r(a;|CT*) . 

An individual’s aim is to maximise this expected total reward. 



3.2 Markov Decision Processes 

There is a special class of decision processes in which the state transition prob- 
abilities depend only upon the current state and not on how that state was 
reached. 

Definition 3.2 

A decision process is said to have the Markov Property if p{xt+i\ht,at) = 
p{xt+i\xt,at). 



Definition 3.3 

A decision process with the Markov property is called a Markov Decision Pro- 
cess (MDP) (named in honour of the Russian mathematician Andrei Andree- 
vich Markov). 

From now on, we will consider only MDPs and not more general decision 
processes. Clearly, it is not an easy task to compute the payoff for a general 
strategy and hence find an optimal one. But for finite horizon MDPs the method 
of backward induction (the Principle of Optimality - see Section 2.4) comes to 
our rescue. 

We begin our discussion of Markov Decision Processes by considering a 
simple example. The example is deterministic and can be solved using the 
Lagrangian method for constrained optimisation (see Appendix A). We will 
show that the same optimal strategy can also be found by backward induction 
(dynamic programming). 



Example 3.4 

Consider an investor making a sequence of decisions about how much of their 
current capital to consume (i.e., spend on goods) and how much to invest. 
That is, at time t the investor’s capital Xt is reduced by an amount Ct and the 
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remainder is invested at an interest rate r — 1 to produce an amount of capital 
r{xt — Cf) at the next decision time. For simplicity, let us restrict ourselves to 
the two-period problem of an investor who starts with known capital xq and 
makes decisions about consumption at t = 0 and t = 1. We assume that the 
investor only gains immediate benefit from consumption (the only reason for 
investment is to obtain the benefit of consumption in the future). We shall also 
assume that the investor’s utility for consumption is logarithmic, i.e. 7 t(co, ci) = 
In(co) -P In(ci). 

First we solve this problem using the Lagrangian method. The state equa- 
tion is x\ = r{xo — Co) and we must have ci < xi so the constraint equation is 
Cl < r{xQ — co). Therefore, the Lagrangian is L(co,ci) = In(co) -Pln(ci) — A(ci -P 
rco — rxo) and we must simultaneously satisfy the following three equations. 





c* -P red ~ ^ 2:0 



0 

0 

0 



Solving the first pair of equations provides the following relation between the 
consumptions during the two periods: c* = tcq. Substituting this back into the 
constraint equation gives the optimal strategy as 

Cq = and cj = ^rxo 

To solve the same problem again by the method of backward induction 
we proceed as follows. At t = 1 the payoff is In(ci) subject to the constraint 
Cl < Xi where xi is the amount of capital that the investor has at this time. 
The optimal decision is, therefore, c\{x\) = x\. Note that we don’t know the 
value of x\ because it depends on the behaviour at t = 0 through the state 
equation x\ = r(xo — cq). So what we have is an optimal decision for any value 
of xi. At t = 0, the investor’s problem is to maximise the total payoff assuming 
optimal behaviour at t = 1, i.e., find 



cd G argmax (In(co) -P ln(a;i(co))) . (3.1) 

coGfO.xo] 

Differentiating the payoff and setting the result to zero gives x\ — red = 0. 
Substituting for x\ using the state equation gives Cq = ^xq- (We could write this 
as Cq(xq) but because xq is assumed to be known, we can drop this dependence.) 
Thus the optimal strategy can be written as s*{x,t) = (cd,cd(a;i)) with 

, 1 

Cq — 2^0 



and c\{x\) = x\. 
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Although this solution looks slightly different from the one found by the La- 
grangian method it is, in fact, identical because 

c\{xi) = r{xo-cl) 

= \rx,. 



Exercise 3.1 

Consider a three-period consumption and investment model with loga- 
rithmic utility for consumption. Apart from the change to three periods, 
make the same assumptions that were used in Example 3.4. 

(a) Find the optimal consumption strategy using the Lagrangian method. 
[Hint. We can rewrite the constraint for the two-period problem in the 
form of total consumption discounted to initial (period 0) value 

Cl 

Co H < xo- 

r 

You may find it useful to write the constraint for the current problem in 
this way.] 

(b) Solve the model by backward induction and show that the solution 
is identical to the one obtained using the Lagrangian method. 

Backward induction in a state-dependent problem is often called “dynamic 
programming” . Equation 3.1 is an example of a dynamic programming equation, 
and it is just the second equation from Theorem 2.21 written in a way that is 
suitable for state-dependent decision processes. As a prelude to the introduction 
of stochastic dynamic programming we will now develop a general form for the 
dynamic programming equation in the deterministic case. 

The basis of dynamic programming is answering the following question. 
What is the best action now, assuming optimal behaviour at all potential future 
decision points? The word “potential” is included to indicate that we have to 
know what would be done in all future states, including those that may not 
be reached once the optimal decision has been found and taken. Without that 
information, we could not decide what is optimal. So, let us define 7Tf(x) to be 
the future payoff for starting in state x at time t providing we behave optimally 
for times t,t + l,t + 2, . . . ,T — 1. That is, 

T-l 

<(a;) = XI (3-2) 

T — t 

where xt = x and the sequence of states j, is generated by following 

the sequence of optimal actions t-i- then write the general 
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form of the deterministic dynamic programming equation as 

TTt{x) = max [rt{x,a) + 7 t*+i (a;t+i(a))] (3.3) 

aeA(:c) 

where Xt+i{a) is the state reached from x by using an action a taken from the 
set of actions available in state x, A{x). The backward induction process is 
started by setting 

TT^{x) = rTixr) Vcc G X 

and the payoff achieved by the optimal strategy s* for the starting state(s) of 
interest x is given by 7r(a;|s*) = 7 Tq(x). 

Exercise 3.2 

Relate the various elements of Example 3.4 to the elements in the general 
description of dynamic programming. 



3.3 Stochastic Markov Decision Processes 

In a deterministic MDP, the choice of action in a particular state uniquely 
determines the state of the process at the next decision point. (In some appli- 
cations, therefore, the actions are conveniently described in terms of choosing 
the next state.) For the rest of this chapter, we will consider decision processes 
in which the transitions between states may be uncertain. We will assume that 
these transition probabilities are time-independent: given that an individual is 
in state x at time t and chooses action a, the probability that they find them- 
selves in state x' at time t -P 1 is p{x'\x,a) < 1, Vx' G X. Although we have 
made this “stationarity” assumption, the optimal strategy may nevertheless be 
time-dependent. We have seen this already in Example 3.4 - a deterministic 
MDP is, after all, just a stochastic MDP where all the transition probabilities 
happen to be either 0 or 1. That example also clearly illustrates the fact that 
the time dependence can arise from the finiteness of the problem, because we 
have a zero terminal reward in every state. 

Under certain conditions, a stochastic MDP has a simple diagrammatic 
representation. These conditions are as follows: the number of states is small; 
the number of actions available in each state is small and independent of time 
(at least for t < T); and the rewards obtained for each state-action pair are 
independent of time {rt{x,a) = r{x,a) for t < T). This diagrammatic repre- 
sentation is best introduced by means of an example. 
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X 



y 



Z 



Figure 3.1 Diagrammatic representation of the stochastic MDP described in 
Example 3.5. 

Example 3.5 

The set of states is X = {x,y,z}. In states x and y, we can choose an action 
from the sets A(cc) = A(y) = {a, b}, and in state we have the single-choice 
set A(t;) = {6}.^ If we choose action a in state x, then we receive a reward 
r{x,a) = 2 and move to state y with probability 1. If we choose action b 
in state x, then we receive a reward r(x, 6) = 3 and remain in state x with 
probability 1. In state y, if we choose a, then we receive r{y, o) = 5 and move 
to state X with probability 1, whereas choosing b gives us r(y, b) = 10 and 
we transfer to state z with 50% probability and remain in state y with 50% 
probability. If we find ourselves in state z, then we can only choose b, which 
gives us r(z,b) = 0 and we remain in state z with probability 1. 

This lengthy description can be presented much more concisely by means 
of the diagram shown in Figure 3.1. In the case of a finite horizon problem this 
diagrammatic description must be supplemented by specifying the horizon T 
and the final rewards rxix), \/x € X. 



Remark 3.6 

Note that in Figure 3.1 the state z is absorbing: if the process ever arrives in 
state z it stays there. Furthermore, the payoff received in state z is zero. The 
existence of a zero-payoff, absorbing state is quite common in MDP models. 

^ A choice without an alternative is often known as Hobson’s choice, though the 
phrase is also applied to “take it or leave it” choices, which would be a two- 
element set. The term is said to originate from Thomas Hobson (ca. 1544-1631) 
who owned a livery stable at Cambridge, England. He allegedly required every 
customer to take either the horse nearest the stable door or none at all. 
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For example, in a biological context, a transition to state 2 could represent the 
death of the organism. 



To solve problems such as that given in Example 3.5, we will need a stochas- 
tic version of the dynamic programming equation (Equation 3.3). Fortunately, 
this is easy to write down. Define TTt{x\a) to be the future payoff for choosing ac- 
tion a in state x at time t and behaving optimally for times t-P 1, t-|-2, . . . , T— 1. 
That is, 

TTt{x\a) = r{x,a) -P ^ p{x'\x,a) (x') . 

x'ex 

Now define 7Tf(x) to be the future payoff for starting in state x at time t 
providing we behave optimally for times t,t + l,t + 2, . . . ,T — 1. (As before, 
7 tJ‘(x) is given by Equation 3.2.) The stochastic dynamic programming equation 
is then 



ttNcc) = max7rt(a|a;) 

aeA 



(3.4) 



= max 

aGA 



rt{x,a) + ^ p{x'\x,a) {x') 

x'^X 



(3.5) 



If two actions, say a and b, both lead to the maximum future payoff tt*{x), 
then either can be chosen. (In fact, the combination “a with probability p and 
b with probability 1 — p” also gives the same maximum future payoff, but in 
this case the randomisation is best regarded as a way of breaking the tie rather 
than as a necessity. See Theorems 1.32 and 2.18.) 

Except in very rare cases, stochastic MDPs are not solved for arbitrary 
parameter values. Take the problem shown in Figure 3.1, for example. If all the 
rewards, transition probabilities, and the horizon T were left unspecified, there 
would be 15 parameters to deal with. Even if an analytic solution could be 
found (which would be difficult), understanding the way a solution changes as 
all these parameters are varied is not really feasible. Instead, the usual approach 
is to fix all of the parameter values and find a solution numerically. Often a 
computer program is employed; but if the number of states, the number of 
actions and the time horizon are all small it is possible to do this “by hand” . 



Example 3.7 

Consider the MDP shown in Figure 3.1. We will additionally assume that 
decisions are to be made at times t = 0, 1 and 2 (i.e., T = 3) and that 
rsix) = r3{y) = ra(z) = 0. 

The first thing to note is that in state z, there is no choice to be made: 
a*{z,t) = b and 7rJ‘(z) = 0, Vt. The absence of a terminal reward rrix) = 0 
also gives us 7r|(a;) = ttK?/) = 0. 
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At time t = 2 in state x, we have 

7T2(a;|o) = 2 + 7T3 (i/) = 2 
7T2(a;|6) = 3 + 7T3(a;) = 3. 

So a*{x,2) = h and n^ix) = 3. In state y, we have 

7T2(?/|a) = 5 + 7T3(a;)=5 

T^ 2 {y\b) = 10+ i7T3(?/) + ^713(2) = 10. 

So a*{y,2) = b and ir^iy) = 10. 

At time t = 1 in state x, we have 

7Ti(a;|a) = 2 + 772 ( 2 /) = 12 

7Ti(x| 5) = 3 + 7T2(a;)=6. 

So a*{x, 1) = o and 7r*(a:) = 12. In state y, we have 

TTi{y\a) = 5 + 772 ( 0 ;) =8 

TTi{y\b) = 10+ ^ 772 ( 2 /) + ^772(z) = 15. 

So a*{y,2) = b and TT*{y) = 15. 

At time t = 0 in state x, we have 

77o(x|a) = 2 + 773 ( 2 /) = 17 
TTo{x\b) = 3 + 77*(a;) = 15. 

So a*(a;,0) = a and 77g(a;) = 17. In state y, we have 
77o(2/|a) = 5 + 773 ( 0 ;) = 17 
TTo{y\b) = 10+ ^ 77 ^( 2 /) + ^77^(z) = 17.5. 

So a* ( 2 /, 2) = b and 773 ( 2 /) = 17.5. 

The solution of the problem is, therefore, the optimal strategy^ 



t = 0 t=l t = 2 




and a payoff of 17 if the process starts in state 0 ; or a payoff or 17.5 if the 
process starts in state y. 

^ At least, it is the optimal backward induction strategy. We will see later that it is, 
in fact, optimal in the sense that it is the best of all possible strategies. 
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Exercise 3.3 

Use the strategy in Equation 3.6 to follow the decision process forward 
in time (i.e., starting at t = 0 then moving to t = 1, etc.). Check that 
the strategy, indeed, produces the expected payoffs ttq{x) and 7rg(y) that 
were found by backward induction. 



3.4 Optimal Strategies for Finite Processes 

We now turn to the question of whether the pure strategy found by dynamic 
programming is truly optimal. It turns out that the strategy found by dynamic 
programming is still optimal when the class of possible strategies is enlarged 
to include more general types of strategy - while other strategies may do as 
well as the dynamic programming strategy, none can do better. Let us first 
consider the possibility of randomising strategies that depend only upon the 
current state. 

Definition 3.8 

Denote the set of actions available in state x at time t by A{x,t). A general 
Markov strategy (3 specifies using action a € A{x,t) with probability f{a\x,t) 
where 

f{a\x,t) = l. 

The set of all Markov strategies will be denoted by B. 

Theorem 3.9 

Consider a finite-horizon Markov Decision Process and let s* be a strategy 
found by dynamic programming. Then s* is an optimal Markov strategy for 
that process. 

Proof 

The proof proceeds by induction on t (backward in time, naturally). Let TTt{x\f3) 
be the expected future payoff at time t for using a strategy /3 given that the 
decision process is in state x at that time. Assume that the backward induction 
strategy is optimal for some time t + 1: 



TTt+i{x\l3) < 7Tt+i(x|s*) = Trl^i{x) Vx G X and V/3 G B. (3.7) 
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Then (for arbitrary /3) 




’^t{x\(i) = ^ f{a\x,t) 


r(x,a)+ ^ p{x'\x,a)Tit+i{x'\(3) 




x'^y. 


< ^ f{a\x,t) 


r(x,a)+ ^ j^(x'|x,aX+i(xO 


/ 


x'^y. 

\ 


< /(a|a;,t) j 7T*(a;) 



a^A.{x,t) 



= < (a^) 

where the first inequality follows from the inductive assumption (Equation 3.7) 
and the second follows from the stochastic dynamic programming equation. 
Because 

TTx{x\P) = rx{x) = TTx{x) Vx G X. 

The inequality 'Kt{x\(3) < holds for all x and t. In particular, the opti- 
mality condition tto{x\I3) < 7Tq(x) Vx € X holds. □ 

Having established that the strategy found by dynamic programming is 
an optimal Markov strategy, we now consider whether enlarging the class of 
available strategies leads to a better strategy. 

Definition 3.10 

Denote the set of actions available in state x at time t by A{x,t). For each 
history ht{x) that leads to state x at time t, a behavioural strategy specifies 
using action a G A{x,t) with probability (j>{a\ht{x),t), where 

^ <j){a\ht{x),t) = 1. 

a^A.{x,t) 

Markov strategies are a subset of the set of behavioural strategies - the 
subset where decisions are conditioned only on the part of the history that 
specifies the current state. 

Theorem 3.11 

No behavioural strategy gives a higher payoff than the strategy found by dy- 
namic programming. 
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Proof 

The proof proceeds by showing that for every general behavioural strategy, 
there is a Markov strategy that gives the same expected payoff. Consequently, 
no behavioural strategy can do better than every Markov strategy. Because 
dynamic programming strategies are optimal Markov strategies, the desired 
result follows. See Filar & Vrieze (1997) for details. □ 

Exercise 3.4 

Consider the MDP shown below with T = 3 and terminal rewards 
fsix) = rs^y) = r^^z) = 0. Find the optimal strategy. 





X 



y 



z 



3.5 Infinite-horizon Markov Decision Processes 



We now consider processes in which there is no a priori termination: the process 
continues forever, unless some decision is taken that has the consequence of 
ending the process. 

When considering an infinite-horizon process, we might consider the payoff 
when starting in state x as being given by limit as T — >■ oo of the corresponding 
payoff for a finite-horizon process. 



Tr{x\a) 



E 



y^rt{Xt,At) 

J=o 



^E[rt(Vt,Ai)]. 

t=o 



Obviously, because there is no a priori termination of the process, there are no 
terminal rewards. The first question is: does the limit as T — >■ oo exist for all 
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possible strategies? 

Example 3.12 

Consider a very simple process with only one state, x. In that state, the action 
set is A = { 01 , 02 } and r{x,ai) = 1 and r(x, 02 ) = 2, independent of time. 
Whichever action is chosen the process (necessarily) returns to x. Suppose we 
have two strategies: Si = “always choose Oi” and S 2 = “always choose 02 ”. 
Clearly, with the payoff defined as above, we have 

7r(a:|si) = ’n{x\s2) = 00 

so there is apparently no way of choosing the better strategy. 

Instead of considering constant rewards, let us introduce the simple time 
dependence: rt{x,a) = S*r{x,a). Provided the discount factor 6 is such that 
0 < (5 < 1 the payoff 

00 

7T{x\a) =E^ [S^r{Xt,At)] (3.8) 

is finite for all strategies cr and all initial states x. 

Exercise 3.5 

Consider the process described in Example 3.12. Find the payoffs for the 
strategies si and S 2 if rewards are discounted by an amount 0 < <5 < 1. 

Why should discounting be introduced into a model? Apart from being a 
mathematical “trick” to ensure the finiteness of all payoffs as already discussed, 
there are two reasons. First, if the rewards are monetary, the discount factor S 
models a (constant) depreciation of value due to inflation: one unit of currency 
next year will be worth less (i.e., buy less) than one unit of currency today. 
Second, the discount factor can be viewed as the probability that the decision 
process continues for at least one more time step. That is, with probability 
1—5 some catastrophe occurs (independently of any strategy adopted) that 
terminates the decision process. In a biological context, 1 — d is the probability 
that the organism dies as a result of factors not being explicitly considered in 
the problem. 
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3.6 Optimal Strategies for Infinite Processes 

The dynamic programming equation (including discount factor) for a finite 
horizon problem is 

r{x,a) +<5 X! ■ 

x'£X 

An infinite horizon model has the property that after a decision has been made 
an individual will find themselves facing the same infinite horizon decision 
problem as before, albeit starting in a different state. Therefore, it seems rea- 
sonable to guess that the infinite-horizon dynamic programming equation can 
be found by setting 7rJ'(a:) = = 'k*{x) Vx G X in the equation above. 

The following theorem shows that this guess is correct. 



7T( (x) = max 
aG A(af) 



Theorem 3.13 

The optimal payoffs satisfy the dynamic programming equation 



TT*{x) = max 

aG A(af) 



r(a;,a)-|-(5 p(a:'|a;, a)7r*(a;') 

k'GX 



(3.9) 



Proof 



Let a be an arbitrary strategy that chooses action a at t = 0 with probability 
/(a). Then 



n{x\a) = /(«) 

a^A.(x) 



r{x,a) + S p(a:'|a:, a)7Ti(a:'|cr) 



£c'GX 



where TTi{x'\a) is the payoff that a achieves starting from state x' at time t = 1. 
Because 7ri{x'\a) < n*{x') we have 



7r(x|cr) < f{a) 

aG A(tc) 



< Y 



r{x,a) + 6 Y^ p{x'\x,a)n*{x') 

x'GX 

r{x,a) + S Y^ p{x'\x,a)TT*{x') 



aG A(tc) 



max 
aG A(tc) 



max 

a^A.{x) 






r(x, a) + (5 p{x'\x, a)7T*{x') 



tc'GX 
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Because the inequality is true for arbitrary a, it holds for the optimal strategy. 
Hence 



7T*(a;) = 7r(x|(j*) 



< max 
aG A(ai) 



r{x,a)+5 p{x'\x,a)Tr*{x') 



k'GX 



Now, in order to show that the opposite inequality also holds, let 



a € argmax 
aG A(ic) 



r(x,a) + S p(x'jx, a)7T* (x') 



x'ex 



and let cr(a) be the strategy that chooses a at f = 0 and then acts optimally 
for the process starting at time t = 1. Then 

> 7r(a;|CT(a)) 

= r(x,a) + S p(x'jx,d)7r*(x') 
x'ex 

r(x, a) + S p{x'\x, a)TT*{x') | 

k'GX 

Combining the two inequalities completes the proof. □ 

Now we show that the optimal payoff is the unique solution of Equation 3.9. 



= max 
aG A(x) 



Theorem 3.14 

The payoff n*{x) is the unique solution of Equation 3.9. 



Proof 

Suppose 7Ti(a:) and tt 2 (x) both satisfy Equation 3.9. Then, setting 



d{x) € argmax 

aG A(ai) 

we have for each state x 



r(x,a) + S p(a;'|a;, a)7Ti(a;') 



x'GX 



7Ti(x) — 7T2 (x) = 6 max > p{x'\x,a)['Kx{x') — 'K 2 {x')\ 



x'GX 
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Let 



then we have 



- ^ p{x'\x,d{x)) Ui{x') - 7T2(xO| 

< 5y p(a;'|a;, a(x)) max |7Ti(a;') — 7T2(a;')| 

x'^X 

= i5max |7Ti(a:') — 7T2(a;')| . 

x' ^X 



Xm G argmax |7Ti(a;') — 7T2 (x')| 

a'GX 



T^l{Xm) - T^2{Xm) < S \ni{Xm) ~ 7r2{xTn)\ ■ 
Reversing the roles of tti and 7T2 gives 

T^2iXm) - TTl{Xm) < S \TT2{Xm) ~ Tri{Xm)\ ■ 
Combining the two inequalities yields 

\T^l{Xm) - T^2{Xra)\ < 5\TTi{Xm) ~ TT2{Xm)\ 
and, because 5 < 1, we must have 

\TTliXm) - Tr2{Xm)\ = max |7Ti (x') ~ 7T2(a:') | 

x'^X 

= 0 . 



Hence 

|7Ti(x) — 7T2 (x)| =0 Vx € X . 

□ 

Now that we know that the optimal strategy gives the payoff which uniquely 
satisfies Equation 3.9 we can use this fact to prove that a stationary and non- 
randomising strategy is optimal. 



Definition 3.15 

Let s be a non-randomising and stationary strategy that selects action a(x) G 
A(x) every time the process is in state x. Let g{x) : X — >• R be a bounded 
function (i.e., g{x) < oo Vx G X). Define an operator Tg by 

(Tsg){x) = r(x,a(x)) + S ^ p{x'\x,a{x))g{x'). 

Suppose we let Tg act on g{x) and then let Tg act on the result of the first 
operation. We will denote the combined operation by (Tgg)(x). Similarly, the 
n-fold action of Tg will be denoted T”. 
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Lemma 3.16 

For any bounded, real function g{x), limn^oo (T^g){x) = 7r(x|s). 



Proof 



iTsg){x) = r{x,a{x)) +S p{x'\x,a{x)){T^ ^g){x') 

= r(a;, a(cc)) + (5 p{x'\x,a{x))r{x' ,a{x')) 

x'ex 

+6^ p{x"\x',a{x')){Tr^g){x") 

a:"GX 

= E[r(Xo,oo)|s] +E[r(Xi,ai)|s] 

^ p{x"\x',a{x')m-^g){x") 

a:"GX 

Continuing the expansion and because i5 < 1 we have 

OO 

lim (T^g){x) = ^E[r(X(,at)] 

n—^oo < ^ 

= 7r(a;|s). 

□ 



Theorem 3.17 

Let 



\x) 



G argmax 

aG A(a:) 



r(a;,a)+<5 p(a:^|a;, a)7r*(a;^) 

ai'GX 



Vcc G X 



(3.10) 



and let s* be the non-randomising and stationary strategy that selects a*(x) 
every time the process is in state x. Then s* is optimal. 



Proof 

By the definition of s* and using the dynamic programming equation we have 

(Ts.7T*)(a;) = Ti*{x) 



which implies that 



(T,".7t*)(x) = n*{x) Vn. 
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X 



y 



Z 



Figure 3.2 Diagrammatic representation of the Markov decision process for 
Exercise 3.6. 

Now, letting n — >■ oo and using the result of Lemma 3.16, we have 
7r(x|s*) = lim (TJl7r*)(x) 

n—¥oo 

= n*{x). 

□ 

If we can guess the optimal strategy, then all we have to do is check that 
the actions a{x) specified by that strategy satisfy Equation 3.10. 

Exercise 3.6 

Consider the MDP shown in Figure 3.2. Assuming that this as a dis- 
counted infinite-horizon problem with 6 = ^, show that the optimal 
strategy is a*(x) = a and a*{y) = a. (Because being in state z gives the 
highest reward, it seems worth trying a strategy that eventually puts the 
process in state z starting from any state.) [Hint: solve the dynamic pro- 
gramming equation to find the payoffs for following the specified strategy. 
Then show that changing the action chosen in any state gives a lower 
payoff.] 



3.7 Policy Improvement 

The dynamic programming equation suggests the following iterative procedure 
for finding an optimal strategy and its associated payoffs in an infinite-horizon 
MDP. This procedure is often called Policy Improvement because strategies are 
called “policies” by people who study MDPs but not games. 
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1. Start by picking an arbitrary strategy s that specifies using action a(x) in 
state X. 

2. Solve the set of simultaneous equations 

7t(x|s) = r(x, a) + (5 p{x'\x, a)7r(a:'|s) 

ai'GX 

to find the payoffs for using that strategy. 

3. Find an improved strategy by solving 



a{x) € argmax 
aG A{ic) 



r(x, a) + (5 p{x'\x, a)7r(a:'|s) 



x'ex 



4. Repeat steps 2 and 3 until the strategy doesn’t change. 
The following theorem proves that this algorithm works. 



Theorem 3.18 



Suppose we have some stationary pure strategy s that yields payoffs 7r(a;|s). 
Let 



a(x) G argmax 

aG A(a:) 



r(x,a) + S p(a;'|a;, a)7r(a:'|s) 
x'ex 



VxeX 



and let s be the (non-randomising and stationary) strategy that selects d(x) 
every time the process is in state x. Then s is either a better strategy than s 
or both strategies are optimal. 



Proof 

Consider the operator Tg associated with the new strategy s acting on an 
arbitrary bounded function g(x): 

(Tsg)(x) = r(x,a(x)) + X! 

From the definition of d(x), we have 

TsTt{x\s) > 7r(a;|s) Vx G X. 

Acting repeatedly on this inequality with Tg gives 

T"7t(x|s) > TF“^7t(x|s) > • • • > 7t(x|s). 
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X 



y 



Z 



Figure 3.3 Diagrammatic representation of the Markov decision process 
solved in Example 3.19. 



Now, letting n — >■ oo and using the result of Lemma 3.16, we have 

7r(a;|s) > 7r(a:|s) Vx € X. 



So we have shown that the new strategy s is at least as good as the old one. 
Next we will show that if the strategy s is not strictly better than s in at least 
one state, then both strategies are optimal. 

We have just established that 



7t(x|s) > Ts7r(x|s) > 7 t(x|s). 

Now suppose that 7t(x|s) = 7r(x|s) Vx G X. This implies that 

7t(x|s) = Ts7t(x|s) 

= Ts7t{x\s) 



= max 
aG A(ai) 



r(x,a)+<5 p(x^|x, a)7r(x^|s) 



k'GX 



So 7t(x|s) satisfies the optimality equation. By the uniqueness of the solution 
to that equation (Theorem 3.14) we must have 



7t(x|s) = 7T*(x) = 7 t(x|s) 



which proves that s and s are both optimal strategies. 



□ 



Example 3.19 

Consider the problem shown in Figure 3.2 with discount factor Let us 

begin with the strategy sq = {a(x) = b, a{y) = b}. 
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Iteration 1: The payoffs for sq Eire found from 

7r(a;|so) = 2 + i57r(x|so) and 

7!-(y|so) = 5+ i(57r(?/|so) 

which give 7 t(x|so) = 20 and 7r(y|so) = 

Then the payoffs for changing the action taken in each state to a are 

, , , , , , , 9 100 101 

r(a;,a) + d7r(j/|so) = ^+YoTT^Tr 

r{y,a) + ^Stt{x\so) = 3 + ^ x 20 = 12 

Because ^ < 20 and 12 > we can conclude that a{x) = b and a{y) = a. 
Let us call this new strategy si. 

Iteration 2: The payoffs for si are found from 

7r(a;|si) = 2 + i57r(x|si) and 

T^iylsi) = 3 + i(57r(a;|si) 

which give 7 t(x|si) = 20 and 7r(y|si) = 12. 

Then the payoffs for changing the action taken in each state are 

9 118 

r(a;, a) + i57r(i/|si) = 1+— xl2=— and 

r 1. r I ^ r 9 104 

r{y,b) + -6 tt{x\si) = 5+^xl2= — 

from which we can conclude that changing strategy does not yield a better 
payoff. Therefore 

s* = {a(a;) = b,a{y) = a} 

is an optimal strategy. 

The optimal strategy that we have found is one that seems intuitively rea- 
sonable for (5 — >■ 1 because it reduces the probability that the process will end 
up in state z producing an “infinite” stream of zero rewards. 

Exercise 3. 7 

Find the optimal strategy for the previous exercise by starting with 
s = {a{x) = a, a(y) = a} or s = {a(x) = a, a(y) = b}. 





Part II 



Interaction 




4 

Static Games 



4.1 Interactive Decision Problems 

An interactive decision problem involves two or more individuals making a 
decision in a situation where the payoff to each individual depends (at least 
in principle) on what every individual decides. Borrowing some terminology 
from recreational games, which form only a subset of examples of interactive 
decision problems, all such problems are termed “games” and the individuals 
making the decisions are called “players” . However, recreational games may 
have restrictive features that are not present in general games: for example, it 
is not necessarily true that one player “wins” only if the other “loses” . Games 
that have winners and losers in this sense are called zero-sum games; these are 
considered in Section 4.7.3. 

Definition 4.1 

A static game is one in which a single decision is made by each player, and 
each player has no knowledge of the decision made by the other players before 
making their own decision. 

Sometimes such games are referred to as simultaneous decision games because 
any actual order in which the decisions are made is irrelevant . The most famous 
example of an interactive decision problem is probably the Prisoners’ Dilemma. 
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Example 4.2 (Prisoners’ Dilemma) 

Two crooks are being questioned by the police in connection with a serious 
crime. They are held in separate cells and cannot talk to each other. Without 
a confession, the police only have enough evidence to convict the two crooks 
on a lesser charge. The police make the following offer to both prisoners (in 
separate rooms so that no communication between them is possible): if one 
confesses that both committed the serious crime, then the confessor will be set 
free and the other will spend 5 years in jail (4 for the crime and 1 for obstructing 
justice); if both confess, then they will each get the 4- year sentence; if neither 
confess, then they will each spend 2 years in jail for the minor offense. 

We can describe this game more succinctly using the following table of 
payoffs, where the possible courses of action open to each prisoner are (i) Q = 
“Keep Quiet” or (ii) S = “Squeal” . The payoffs are given in terms of years of 
freedom lost. The payoffs for the first prisoner (Pi) are given first in each pair 
of entries in the table; those for the other prisoner (P 2 ) come second. 



P2 





Q 


s 


Q 


- 2,-2 


- 5,0 


s 


0,-5 


-4,-4 



What should each prisoner do? First, consider Pi. If P 2 keeps quiet, then 
they should squeal because that leads to 0 years in jail rather than 2 years. On 
the other hand, if P 2 squeals, then they should also squeal because that leads 
to 4 years in jail rather than 5. So whatever P 2 does. Pi is better off if they 
squeal. Similarly, P 2 is better off squealing no matter what Pi does. So both 
prisoners should squeal. 

The interest in this game arises from the following observation. Both players, 
by following their individual self-interest, end up worse off than if they had 
kept quiet. This apparently paradoxical result encapsulates a major difference 
between non-interactive and interactive decision models (games). It might be 
argued that they should have had an agreement before being arrested that they 
wouldn’t squeal ( “honour among thieves” ) . However, each prisoner has no way 
of ensuring that the other follows this agreement. Of course, a prisoner could 
exact revenge in the future on a squealer ~ but that is another game (with 
different payoffs). 



Definition 4.3 

A solution is said to be Pareto optimal (after the Italian economist Vilfredo 
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Pareto) if no player’s payoff can be increased without decreasing the payoff to 
another player. Such solutions are also termed socially efficient or just efficient. 

The Prisoners’ Dilemma is often used as the starting point for a discussion 
of the Social Contract (i.e., how societies form and how they are sustained) 
because the socially inefficient nature of its solution is reminiscent of many 
features of society. For example, consider paying taxes. Whatever anyone else 
does, you are better off (more wealthy) if you do not pay your taxes. However, 
if no-one pays any taxes (because, like you, they are following their own self- 
interest), then there is no money to provide community services and everyone 
is worse off than if everyone had paid their taxes. 



Example 4.4 (Standardised Prisoners’ Dilemma) 

Any game of the form 



P2 





C 


D 


c 


r, r 


s, t 


D 


t, s 


P,P 



with t>r>p>s is called a Prisoners’ Dilemma.^ A particularly common 
version has payoffs given by t = 5, r = 3, p = 1, and s = 0. The available 
courses of action are generically called “cooperation” (C) and “defection” (D).^ 
Analysis of this game tells us that we should expect both players to defect - a 
solution that is socially inefficient. 



4.2 Describing Static Games 

To describe a static game, you need to specify: 

1. the set of players, indexed by i G {1,2,...}; 

2. a pure strategy set, S^, for each player; 

^ These letters are conventionally used to represent (t) the payoff for yielding to 
temptation, (r) the reward for cooperating, (p) the punishment for defection, and 
(s) the payoff for being a sucker and not retaliating to defection. 

^ In the original version, the two prisoners are playing a game against each other, not 
against the police. So keeping quiet can been viewed as cooperating and squealing 
can be seen as defecting. 
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3. payoffs for each player for every possible combination of pure strategies 
used by all players. 

To keep the notation simple, we will concentrate on two-player games for most 
of this book. Games with more than two players are briefly considered in Sec- 
tion 4.8. In the two-player case, it is conventional to put the strategy of player 
1 first so that the payoffs to player i are written 

Vsi G Si and Vs 2 G S 2 . 



Definition 4.5 

A tabular description of a game, using pure strategies, is called the normal 
form or strategic form of a game. 

Remark 4.6 

It is important to note that the strategic form uses pure strategies to describe 
a game. For a static game, there is no real distinction between pure strategies 
and actions. However, the distinction will become important when we consider 
dynamic games. (See the discussion in Section 2.3 for the importance in non- 
interactive decision problems.) 



Example 4.7 

The strategic form of the Prisoners’ Dilemma is the table shown in Example 4.2. 
The pure strategy sets are Si = S 2 = {<5, -S'} and the payoffs are given in the 
table, e.g., 

7Ti((5, Q) = -2 7Ti((5, S') = -5 7T2((5,S)=0. 



Definition 4.8 

A mixed strategy for player i gives the probabilities that action s G Si will be 
played. A mixed strategy will be denoted Ci and the set of all possible mixed 
strategies for player i will be denoted by Si. 

Remark 4.9 

If a player has a set of strategies S = {sa, ss, Sc, ■ ■ ■} then a mixed strategy can 
be represented as a vector of probabilities: 



(p(sa),p(sb),p(sc),...) ■ 
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A pure strategy can then be represented as a vector where all the entries are 
zero except one. For example, 



Sfc = (0,1,0,...) . 

Mixed strategies can, therefore, be represented as linear combinations of pure 
strategies: 

O' = ■ 

ses 



Usually, we will denote the probability of using pure strategy s by p{s) for 
player 1 and and q{s) for player 2. The payoffs for mixed strategies are then 
given by 

7T*(cri,cr2) = E E P(si)g(s2)7Ti(si,S2) . 

si^Si S2GS2 

As usual, the payoffs are assumed to be a representation of the preferences of 
rational individuals or of their biological fitness, so that an individual’s aim is 
to maximise their payoff (see Sections 1.3 and 1.4). As we have already seen 
in Example 4.2, this “maximisation” has to take into account the behaviour of 
the other player and, as a result, the payoff achieved by any player may not be 
the maximum of the available payoffs. 

Notation 4.10 

A solution of a game is a (not necessarily unique) pair of strategies that a 
rational pair of players might use. Solutions will be denoted by enclosing a 
strategy pair within brackets, such as (A, B) or (cti, 1 J 2 )) where we will put the 
strategy adopted by player 1 first. For example, the solution of the Prisoners’ 
Dilemma can be represented by (S', S'). 

Exercise 4.1 

In Puccini’s opera Tosca, Tosca’s lover has been condemned to death. 
The police chief, Scarpia, offers to fake the execution if Tosca will sleep 
with him. The bargain is struck. However, in order to keep her honour, 
Tosca stabs and kills Scarpia. Unfortunately, Scarpia has also reneged on 
the deal and Tosca’s lover has been executed. Construct a game theoretic 
representation of this operatic plot. 
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4.3 Solving Games Using Dominance 

Because we solved the Prisoners’ Dilemma in an intuitively simple manner by 
observing that the strategy of “Squealing” was always better than “Keeping 
Quiet”, it seems reasonable to attempt to solve games by eliminating poor 
strategies for each player. 

Definition 4.11 

A strategy for player 1, cti, is strictly dominated by a'l if 
7 tl(cr'i,CT2) > 7Ti(cri,CT2) V(T2 G S2 • 

That is, whatever player 2 does, player 1 is always better off using rather 
than a\. Similarly, a strategy for player 2, (J 2 , is strictly dominated by a '2 if 

712(0-1, CT2) > 712(0-1, CT2) Vo-i G El . 



Definition 4.12 

A strategy for player 1, cti, is weakly dominated by if 
7Ti(o-'i, 0 - 2 ) > 7Ti(o-i, 0 - 2 ) Vo -2 G S 2 

and 

3 ct 2 G S 2 S.t. 7 Ti(o-(,CT2) > 7Ti(o-i,CT2) . 

A similar definition applies for player 2. 

We have already solved the Prisoners’ Dilemma by the elimination of strictly 
dominated strategies. The following example illustrates the solution of a game 
by the elimination of weakly dominated strategies. 

Example 4.13 

Consider the following game. 



P2 





L 


R 


u 


3,3 


2,2 


D 


2,1 


2,1 
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For player 1, U weakly dominates D and, for player 2, L weakly dominates R. 
Consequently, we expect that player 1 will not play D and player 2 will not 
play R, leaving the solution (C, L). 

To solve a game by the elimination of dominated strategies we have to 
assume that the players are rational. However, we can go further, if we also 
assume that: 

1. The players are rational. 

2. The players all know that the other players are rational. 

3. The players all know that the other players know that they are rational. 

4. . . . (in principle) ad infinitum. 

This chain of assumptions is called Common Knowledge of Rationality, or CKR. 
It encapsulates the idea of being able to “put oneself in another’s shoes” . By 
applying the CKR assumption, we can solve a game by iterating the elimination 
of dominated strategies. 



Example 4.14 

Consider the following game: 



P2 





L 


M 


R 


u 


1,0 


1,2 


0,1 


D 


0,3 


0,1 


2,0 



Initially player 1 has no dominated strategies. For player 2, R is dominated by 
M. So R is eliminated as a reasonable strategy for player 2. Now, for player 1, 
D is dominated by U . So D is eliminated as a reasonable strategy for player 1. 
Now, for player 2, L is dominated by M. Eliminating L, leaves {U,M) as the 
unique solution. (The levels of CKR listed explicitly above have been used in 
this example.) 

There is a problem with the iterated elimination of dominated strategies 
when it comes to dealing with weakly dominated strategies: the solution may 
depend on the order in which strategies are eliminated. 

Example 4.15 

Consider the following game: 
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P2 





L 


M 


R 


u 


10,0 


5,1 


4,-2 


D 


10,1 


5,0 


1,-1 



Order 1: Eliminate D for player 1. Now eliminate L and R for player 2. The 
remaining strategy pair {U,M) is postulated as the solution, but using a differ- 
ent order of elimination we arrive at a different result. Order 2: Eliminate R for 
player 2. Neither player now has any dominated strategies, so stop. There are 
four remaining strategy pairs which could be the solution to the game, namely 
(U,L), (U,M), (D,L) and (D,M). 



Exercise 4.2 



Solve the following abstract games using the (iterated) elimination of 
dominated strategies. For the second game, does the solution depend on 
the order of elimination? 



(a) P2 





L 


R 


U 


3,0 


2,1 


D 


2,1 


1,0 



(b) P 2 





L 


R 


U 


0,3 


10,2 


c 


10,4 


0,0 


D 


3,1 


3,1 



4.4 Nash Equilibria 

The next example shows that some games can only be trivially solved using 
the (iterated) elimination of dominated strategies. 



Example 4.16 

Consider the game: 



P2 





L 


M 


R 


U 


1,3 


4,2 


2,2 


c 


4,0 


0,3 


4,1 


D 


2,5 


3,4 


5,6 



From the start, neither player has any dominated strategies leading to the 
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maximally imprecise prediction that anything can happen. (It is in this sense 
that the solution is “trivial”.) 

Nevertheless, there is an “obvious” solution to this game, namely {D,R), 
which maximises the payoff to both players. Is it possible to define a solution in 
terms of something other than the (iterated) elimination of dominated strate- 
gies that both identifies such obvious solutions and keeps many of the results 
derived using dominance techniques? Fortunately, the answer to this question is 
“yes”: such a solution can be provided by the definition of a Nash equilibrium. 



Definition 4.17 

A Nash equilibrium (for two player games) is a pair of strategies (o'J,(T 2) such 
that 

0-2) > 7ri((Ti, (T2) Vcti G Si 

and 

712(0-1, CT2) > 712(0-1, CT2) Vo-2 G S2 . 

In other words, given the strategy adopted by the other player, neither player 
could do strictly better (i.e., increase their payoff) by adopting another strategy. 

Example 4.18 

Consider the game from Example 4.16. Let = R and let CTi = (p, q, 1 —p—q) 
(that is, 0-1 is an arbitrary strategy that specifies using U with probability p, 
C with probability q and D with probability 1 — p — q). Then 

TTi{ai,R) = 2p + -iq + b{l-p-q) 

= 5 — 3p — q 

< 5 

= tti{D,R) . 

Now let al = D and let 0-2 = {p,q,l — p — q). Then 

7 T2(D,CT2) = 5p + 4q + 6{1 — p — q) 

= 6 — p — 2q 

< 6 

= TT2{D,R) . 

Consequently the pair (D, R) constitutes a Nash equilibrium. 
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Exercise 4.3 

Consider the following game. Show that (D,L) and {U,M) are Nash 
equilibria. 



P2 





L 


M 


R 


u 


10,0 


5,1 


4,-2 


D 


10,1 


5,0 


1,-1 



It is clear from Definition 4.17 and the previous exercise that a Nash equi- 
librium never includes strictly dominated strategies, but it may include weakly 
dominated strategies. 

An alternative form of the definition of a Nash equilibrium is useful for find- 
ing Nash equilibria rather than just checking that a particular pair of strategies 
is a Nash equilibrium. First we define the concept of a best response strategy. 

Definition 4.19 

A strategy for player 1, cti, is a best response to some (fixed) strategy for player 
2, (72, if 

CTi G argmax7Ti((Ti, (T 2 ) . 

criGSl 

Similarly, 02 is a best response to some a\ if 

CT2 G argmax7T2(cri, (T 2 ) . 

< 72^^2 



An equivalent form of the definition of a Nash equilibrium, which focusses 
on the strategies rather than the payoffs, is that is a best response to crj 
and vice versa. 



Definition 4.20 



A pair of strategies (ct*,(J 2 ) is a Nash equilibrium if 



(Tj G argmax7ri((Ti, (T 2 ) 

and 

CT 2 G argmax7T2(crj‘, CT 2 ) . 

< 72^^2 



It is clear that a strictly dominated strategy is never a best response to 
any strategy, whereas a weakly dominated strategy may be a best response to 
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some strategy. This is why weakly dominated strategies may appear in Nash 
equilibria but strictly dominated strategies do not. 

To use this definition to find Nash equilibria we find, for each player, the 
set of best responses to every possible strategy of the other player. We then 
look for pairs of strategies that are best responses to each other. 

Example 4.21 (Matching Pennies) 

Two players each place a penny^ on a table, either “heads up” (strategy H) 
or “tails up” (strategy T). If the pennies match, player 1 wins (the pennies); if 
the pennies differ, then player 2 wins (the pennies). 



P2 





H 


T 


H 


+ 1, —1 


— 1, +1 


T 


— 1, +1 


+ 1, —1 



Clearly, this is a game in which the two players have completely opposing 
interests: one player only wins a penny when the other loses a penny. Because 
a penny is a small amount of money (and anyway the coins may be used only 
as a token for playing, with each player retaining their own coin), the payoff 
may be interpreted as a utility (based on the pleasure of winning) of +1 for 
winning the game and a utility of — 1 for losing. 

We can easily check that there is no pure strategy pair that is a Nash equi- 
librium: {H,H) is not an equilibrium because P 2 should switch to T; (H,T) is 
not an equilibrium because P\ should switch to T; (T, H) is not an equilibrium 
because P\ should switch to i7; and, finally, (T,T) is not an equilibrium be- 
cause P 2 should switch to P[. (Intuitively, the solution is obvious: each player 
should randomise - by tossing the penny - and play H with probability |.) 

Let us consider the mixed strategies CTi = (p, 1 — p) for player 1 and a 2 = 
{q, l — q). That is, player 1 plays “Heads” with probability p and player 2 plays 
“Heads” with probability q. The payoff to player 1 is 

7Ti(cri,CT2) = pq-p{l-q)-{l-p)q+{l-p){l-q) 

= l-2q + 2p{2q - 1) 

Clearly, if g < | then player I’s best response is to choose p = 0 (i.e., di = (0, 1) 
or “play Tails”). On the other hand, if q > | then player I’s best response is 
to choose p = 1 (i.e., u\ = (1,0) or “play Heads”), li q = \ then every mixed 
(and pure) strategy is a best response. 

® In parts of Europe, you could use cents. 
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Now consider the payoff to player 2. 

7T2(cti,CT2) = -pq + p{l-q) + {l-p)q-{l-p){l-q) 

= — 1 + 2p + 2f/(l — 2p) 

Clearly, if p < | then player 2’s best response is to choose <7=1 (i.e., (T 2 = (1, 0) 
or “play Heads”). On the other hand, if p > | then player 2’s best response is 
to choose q = 0 (i.e., <72 = (0, 1) or “play Tails”). If p = | then every mixed 
(and pure) strategy is a best response. 

So the only pair of strategies for which each is best response to the other is 
(T* = (72 = (^, |). That is, 

- '1 n /I r 
^2’ 27 ’ V2’ 2^ 

is a Nash equilibrium and the expected payoffs for each player are 



.^1 5 <^ 2 \ — 



= 7T2(ft)‘,CT2) = 0 . 



Remark 4.22 

In contrast to single-player decision models (see Theorem 1.32), there is no 
solution to the Matching Pennies game involving only non-randomising strate- 
gies. In any given realisation of the Matching Pennies game, the outcome will 
be one of {H,H), (H,T), (T,H), or (T,T) each with probability The out- 
come of a game occurs as a result of the strategies chosen by the players, but 
a player’s strategy is not the same as a choice of outcome. 

Exercise 4.4 

Find all the Nash equilibria of the following games. 



(a) P 2 





L 


R 


u 


4,3 


2,2 


D 


2,2 


1,1 



(b) P 2 





R 


W 


F 


0,0 


2,1 


M 


1,2 


0,0 



We can often simplify the process of finding Nash equilibria by making use 
of the next two theorems. The first of these theorems makes it easy to find 
pure-strategy Nash equilibria. 



Theorem 4.23 

Suppose there exists a pair of pure strategies (s*, such that 
7ri(s);,S2) > 7 Ti(si,S 2) Vsi G Si 
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and 7T2 (s*,S2) ^ 

Then (s^jS^) is a Nash equilibrium. 

Proof 

For all (Ti G Si we have 

7ri((Tl,S2) = 

< 

For all (72 G S 2 we have 

< 

Hence (sijS^) is a Nash equilibrium. 



7 T 2 (5^,52) VS 2 G S2 . 



^ P(s)7Ti(si,S2) 
seSi 

^ p(s)7Tl(sJ,S^) 
sGSi 

7Ti(Si, S 2 ) . 

^ g(s)7T2(Si,S2) 
sGSa 

^ g(s)7Ti(Si,s;) 
sGSa 

7T2(Si,S2) . 

□ 



Example 4.24 

Consider again the game from Example 4.16 



P2 





L 


M 


R 


u 


1,3 


4,2 


2,2 


c 


4,0 


0,3 


4,1 


D 


2,5 


3,4 


5,6 



Payoffs corresponding to a pure strategy that is a best response to one of 
the opponent’s pure strategies are underlined. Two underlinings coincide in 
the entry (5,6) corresponding to the strategy pair {D,R). The coincidence of 
underlinings means that H is a best response to R and vice versa (i.e., the pair 
of pure strategies (H, R) is a Nash equilibrium). 

Exercise 4.5 

Find the pure strategy Nash equilibria for the following game. 
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P2 





L 


M 


R 


u 


4,3 


2,7 


0,4 


D 


5,5 


5,-1 


-4,-2 



Exercise 4.6 

A man has two sons. When he dies, the value of his estate (after tax) is 
£1000. In his will it states that the two sons must each specify a sum of 
money Si that they are willing to accept. If si + S 2 < 1000, then each 
gets the sum he asked for and the remainder (if there is any) goes to the 
local home for spoilt cats. If si + S 2 > 1000, then neither son receives 
any money and the entire sum of £1000 goes to the cats’ home. Assume 
that (i) the two men care only about the amount of money they will 
inherit, and (ii) they can only ask for whole pounds. Find all the pure 
strategy Nash equilibria of this game. 

In the process of finding the Nash equilibrium in the Matching Pennies 
game (see Example 4.21), we saw that, for each player, any strategy was a best 
response to the Nash equilibrium strategy of the other player. In particular, 
the payoff for playing H is equal to the payoff for playing T. Intuitively, the 
reason for this is obvious: if the payoffs were not equal, then player i could 
do better than the supposed mixed Nash equilibrium strategy a* by playing 
the pure strategy that assigns probability 1 to whichever of or T gives the 
higher payoff. The following theorem shows that this result is generally true 
for all two-player games. 



Definition 4.25 

The support of a strategy cr is the set S(<t) C S of all the strategies for which 
a specifies p{s) > 0. 

Example 4.26 

Suppose an individual’s pure strategy set is S = {L,M,R}. Consider a mixed 
strategy of the form a = {p,l — p, 0) where the probabilities are listed in the 
same order as the set S and 0 < p < 1. The support of a is S(cr) = {L,M}. 

Theorem 4.27 (Equality of Payoffs) 

Let (crTcr^) be a Nash equilibrium, and let S); be the support of Then 
7Ti(s, crj) = 7ri((j*,cr5) Vs G SJ. 
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Proof 

If the set S* contains only one strategy, then the theorem is trivially true. 
Suppose now that the set SJ contains more than one strategy. If the theorem 
is not true, then at least one strategy gives a higher payoff to player 1 than 
7ri(crJ, crj). Let s' be the action that gives the greatest such payoff. Then 

ses* 

= +P*(s')7ri(s',CT2) 

s^s' 

< '^P*{s)Tri{s',a2)+p*{s')TTi{s',<T2) 

s^s' 

= 7ri(s',CT2) 

which contradicts the original assumption that (crJjCrJ) is a Nash equilibrium. 

□ 

The corresponding result for player 2 also holds. Namely, if (J 2 has support 
S 2 , then 

7r2(cr*,s) = 7r2(CT*,(T2) Vs G 83 . 

The proof is analogous. 

Remark 4.28 

Because all strategies s G S* give the same payoff as the randomising strategy 
CT*, why does player 1 (or indeed player 2) randomise? The answer is that, 
if player 1 were to deviate from this strategy, then would no longer be a 
best response and the equilibrium would disintegrate. This is why randomising 
strategies are important for games, in a way that they weren’t for the single- 
player optimisation problems covered in Part I. 

We can use Theorem 4.27 to find mixed strategy Nash equilibria. 

Example 4.29 

Consider the Matching Pennies game in Example 4.21. Suppose player 2 plays 
H with probability q and T with probability 1 — q. If player 1 is playing a 
completely mixed strategy at the Nash equilibrium, then 



7ri(iL, CT 2 ) 

qm{H,H) + {l-q)n^{H,T) 



7Tl(r,Cr2) 

(77n(T,i7) + (l-(7)7ri(T,T) 
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-q+{l-q) 

2 
1 

2 ■ 

The same argument applies with the players swapped over, so the Nash equi- 
librium is (crj, crj) with CT* = crj = (^, |) as we found before. 

Exercise 4. 7 

Consider the children’s game “Rock-Scissors-Paper” , where 2 children 
simultaneously make a hand sign corresponding to one of the three items. 
Playing “Rock” (R) beats “Scissors” (S), “Scissors” beats “Paper” (P), 
and “Paper” beats “Rock”. When both children play the same action 
(both R, both S, or both P) the game is drawn, (a) Construct a payoff 
table for this game with a payoff of -1-1 for a win, —1 for losing, and 0 
for a draw, (b) Solve this game. 



g-(l-g) = 

^ 4g = 



q = 



4.5 Existence of Nash Equilibria 

John Forbes Nash Jr. proved the following theorem in 1950 as part of his PhD 
thesis, which is why equilibrium solutions to games are called “Nash equilibria” . 

Theorem 4.30 (Nash’s Theorem) 

Every game that has a finite strategic form (i.e., with finite number of players 
and finite number of pure strategies for each player) has at least one Nash 
equilibrium (involving pure or mixed strategies). 



Remark 4.31 

A general proof of Nash’s theorem relies on the use of a fixed point theorem 
(e.g., Brouwer’s or Kakutani’s). Roughly, these fixed point theorems state that 
for some compact set S and a map /: S — >■ S that satisfies various conditions, 
the map has a fixed point, i.e., that f{p) = p for some p € S. The proof of 
Nash’s theorem then amounts to showing that the best response map satisfies 
the necessary conditions for it to have a fixed point. Rather than spending a 
great deal of effort to prove one of the fixed point theorems, it seems preferable 
to restrict our attention to a class of games that is common and for which it 
is easy to provide a self-contained proof. We refer the interested reader to the 
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more general proofs contained in the books by Fudenberg & Tirole (1993) and 
Myerson (1991). 

Proposition 4.32 

Every two player, two action game has at least one Nash equilibrium. 



Proof 



Consider a two player, two action game with arbitrary payoffs: 



P2 





L 


R 


u 


a, b 


c, d 


D 


e,/ 


9,h 



First we consider pure-strategy Nash equilibria: if a > e and b > d then (U,L) 
is a Nash equilibrium; if e > a and f > h then (D, L) is a Nash equilibrium; 
if c ^ and d > b then (U, R) is a Nash equilibrium; if g > c and h > f then 
{D, R) is a Nash equilibrium. There is no pure strategy Nash equilibrium if 
either 



1. a < e and f < h and g < c and d <b, or 

2. a > e and f > h and g > c and d > b. 

In these cases, we look for a mixed strategy Nash equilibrium using the Equality 
of Payoffs theorem (Theorem 4.27). Let a* = (p* , 1 — p*) and = (g*, 1 — q*). 
Then 



and 



TTi{U,al) 
aq* + c(l — q*) 



7ri(D,CT2) 
eq* + 5(1 - q*) 

(c-g) 

(c - g) + (e - a) 



7T2((Ti,L) 

bp* + f(l-p*) 



7T2((Ti,R) 

dp* + h(l-p*) 

(h-f) 

{h-f) + {b-d) 



In both cases, we have 0 < p*,g* < 1 as required for a mixed strategy Nash 
equilibrium. □ 
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Exercise 4.8 

A general symmetric, 2 player, two strategy game has a payoff table 



P2 





A 


B 


A 


a, a 


b, c 


B 


c, b 


d, d 



Show that such a game always has at least one symmetric Nash equilib- 
rium. 



4.6 The Problem of Multiple Equilibria 

Some games have multiple Nash equilibria and, therefore, more than one pos- 
sible solution. 



Example 4.33 (Battle of the Sexes) 



This is the classic example of a coordination game.^ One modern version of the 
story is that a married couple are trying to decide what to watch on television. 
The husband would like to watch the football match and the wife would like 
to watch the soap opera. The total values of their utilities are made up of 
two increments. If they watch the programme of their choice, they get a utility 
increment of 1 (and zero otherwise). If they watch television together, each gets 
a utility increment of 2, whereas they get zero if they watch television apart - 
obviously they must be a rich couple with two TVs. So, using the pure strategy 
set S = “watch soap opera” and F = “watch football” , the payoff table is 



Husband 



Wife 





F 


s 


F 


3,2 


1,1 


S 


0,0 


2,3 



Clearly, this game has two pure-strategy Nash equilibria: (F,F) and (S,S). 

^ For biologists, the “Battle of the Sexes” is a different game - one that has no 
pure-strategy Nash equilibria. See Maynard Smith (1982). 
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There is also a mixed strategy Nash equilibrium with 

a: = (g(F),g(5))= 

This mixed strategy Nash equilibrium can be found using the Equality of Pay- 
offs theorem (Theorem 4.27) or the best response method of Section 4.4 (which 
also finds the two pure-strategy equilibria). 

In this game there is a problem with deciding what strategies will be adopted 
by the players. How should the players decide between these three Nash equilib- 
ria? Can they both decide on the same one? (This is not a problem with Game 
Theory itself: it just demonstrates that even simple interactive decision prob- 
lems do not necessarily have simple solutions.) Note that for the randomising 
Nash equilibrium, the asymmetric outcomes can occur. The most likely outcome 
of the game if both players randomise is (E, S), which occurs with probability 
despite the fact that both players would prefer to coordinate. 

Responses to the existence of multiple Nash equilibria have included: 

1. Using a convention. For example, in the Battle of the Sexes, possible con- 
ventions are 

a) The man will get what he wants, because women are generous. 

b) The man should defer to the woman, because that’s what a gentleman 
should do. 

c) ... 

This then leads to the question of which convention will be used and to the 
development of game-theoretic models of convention formation. 

2. Refine the definition of a Nash equilibrium to eliminate some of the equilib- 
ria from consideration. There have been several attempts to do this (“trem- 
bling hand perfection”, etc.) but, despite the inherent interest of such re- 
finements, they do not succeed in eliminating all but one equilibrium in 
every case either one, many or (unfortunately) no refined equilibria may 
exist. 

3. Invoke the concept of evolution: there is a population of players who pair 
up at various points in time to play this game. The proportion of players 
using any given strategy changes over time depending on the success of that 
strategy (either successful strategies are consciously imitated, or Natural 
Selection sorts it out). The evolutionary interpretation of Nash equilibria 
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can be viewed as a refinement of the Nash equilibrium concept (because 
it favours some equilibria over others). However, it is also important in 
its own right because of the application of game theory to evolutionary 
biology. 



4.7 Classification of Games 

4.7.1 AfRne Transformations 

If it is only the equilibrium strategies, and not the payoffs, which are of interest, 
then it is possible to convert a difficult calculation into a simpler one by means 
of a generalised affine transformation. 

Definition 4.34 

A generalised affine transformation of the payoffs for player 1 is 

S2) = OliTTi{si, S2) + /?l(s 2 ) Vsi G Si 

where > 0 and /3i(s2) G Note that we may apply a different transfor- 
mation for each possible pure strategy of player 2. Similarly, an affine transfor- 
mation of the payoffs for player 2 is 

TTafsi, S 2 ) = a27T2(si, S 2 ) -I- /32(si) Vs 2 G S 2 . 



Example 4.35 

The game 



P2 





L 


R 


u 


3,3 


0,0 


D 


-1,2 


2,8 



can be transformed into 



P2 





L 


R 


U 


2,1 


0,0 


D 


0,0 


1,2 



A standard affine transformation has Pffi) = constant. 



5 
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by applying the affine transformations 

oi = - Pi(L) = i f^i(R) = 0 

«2 = 2 f^2{U) = 0 /32(-D) = -g ■ 

Exercise 4.9 

Demonstrate by explicit calculation that the two games in Example 4.35 
have the same Nash equilibria. 

Theorem 4.36 

If the payoff table is altered by generalised affine transformations, the set of 
Nash equilibria is unaffected.® 






82 



4. Static Games 



4.7.2 Generic and Non-generic Games 
Definition 4.37 

A generic game is one in which a small change to any one of the payoffs^ 
does not introduce new Nash equilibria or remove existing ones. In practice, 
this means that there should be no equalities between the payoffs that are 
compared to determine a Nash equilibrium. 

Most of the games we have considered so far (the Prisoners’ Dilemma, 
Matching Pennies, the Battle of the Sexes) have been generic. The following is 
an example of a non-generic game. 



Example 4.38 

Consider the game 



P2 





L 


M 


R 


u 


10,0 


5,1 


4,-2 


D 


10,1 


5,0 


1,-1 



This game is non-generic because (D, L) is obviously a Nash equilibrium, but 
player 1 would get the same payoff by playing U rather than D (against L). 
Similarly, (U, M) is obviously a Nash equilibrium, but player 1 would get the 
same payoff by playing D rather than U (against M). 

Theorem 4.39 (Oddness Theorem) 

All generic games have an odd number of Nash equilibria. 



Remark 4.40 

A formal proof of the oddness theorem is rather difficult. Figure 4.1 shows 
the best responses for the Battle of the Sexes game. The best response for 
player 1 meets the best response for player 2 in three places. These are the 
Nash equilibria. Drawing similar diagrams for other generic games supports 
the truth of this theorem (at least for games between two players, each with 
two pure strategies). 



7 



So this is not an afhne transformation. 
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1 



q 

1 

4 

0 



Figure 4.1 Battle of the Sexes. The best responses for player 1 are shown by 
a solid line and those for player 2 by a dotted line. Where they meet are the 
Nash equilibria (circled). 

In contrast, the number of Nash equilibria in a non-generic games is (usu- 
ally) infinite. 

Example 4.41 

Consider the game shown in Example 4.38. Define CTi = (p, 1 — p) and (T 2 = 
{q,r,l — q — r). Then 

7ri(cri,CT2) = 1 + 9q + 4r + 5p{l — q — r) 

7T2(cri,cr2) = -(H-p) -|-2g-|-r(l-|-2p) 

The best responses are 

_ f (1, 0) if g -I- r < 1 

( (x, 1 — x) with a: G [0, 1] if g -I- r = 1. 

( (1,0,0) if P < 5 

(72 = < (0, 1,0) if P > 5 

[ (y, 1 - y, 0) with 2 / G [0, 1] ifp=|. 

So the Nash equilibria are 

1. a* = (x,l — x) with X G [0, |) and ctJ = (1, 0, 0) 

2. CT* = (a:, 1 — x) with x G (|, 1] and ctJ = (0, 1, 0) 

3. CT* = (1, 1) and = (y, 1 - y,0) with y G [0, 1] 
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Note that the strictly dominated strategy R = (0,0, 1) is not included in any 
of these Nash equilibria. 

Exercise 4.10 

Find all the Nash equilibria for the following non-generic games. Draw 
the best response graphs for the first game. 



(b) 

Pi 

Sometimes, however, the number of Nash equilibria in a non-generic game 
may be finite and even. 

Exercise 4.11 

Consider the following game. Find all the Nash equilibria for every value 
of A G (— oo, -l-oo). 



a) P 2 





C 


D 


A 


6,0 


5,3 


B 


6,1 


0,0 



P2 





B 


F 


H 


G 


5,0 


-1,1 


2,0 


J 


5,3 


-2,3 


2,3 



P2 





L 


R 


U 


A, A 


1,1 


D 


1,1 


2,2 



4.7.3 Zero-sum Games 

As its name suggests, a zero-sum game is one in which the payoffs to the 
players add up to zero. For example, the game “Matching Pennies” is a zero 
sum game: if the first player uses a strategy cti = (p, 1 — p) and the second uses 
CT 2 = (< 7 , 1 — q) then their payoffs are 

7Ti(cri,CT2) = pq - p{l - q) -i- {1 - p)q - {1 - p){l - q) 

= (2g-l)(2p-l) 

= — ■7T2(cri, (T2) 

In such games the interests of the players are, therefore, exactly opposed: one 
only wins what the other loses. This is in contrast to many other games - such 
as the Prisoners’ Dilemma in which both players end up wining (or losing) the 
same amount. 

Zero-sum games were the first type of game to be studied formally. At that 
time, the concept of a Nash equilibrium did not exist, and games were solved 
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by finding what was referred to as the “minimax” or (“maximin”) solution. 
Fortunately, the minimax solution is just the Nash equilibrium for a zero-sum 
game. Let us define 7r((Ti,(T2) = 7 Ti((Ti, (T 2 ), so 7T2(ai,cr2) = —7r(cri,a2) (in a 
zero-sum game). Then the Nash equilibrium conditions 

7Ti(a*,a2) > 7Ti(ai,o-2) Vcri G Si 

and 7T2((T*,(72) > 7r2(CT*,(J2) V(T2 G S 2 

can be rewritten as 

7t((Ti,CT 2) = max 7r(cri,CT2) 

fTi GSi 

and 7 t((Ti,CT 2) = min 7r(cr*,CT2). 

(T2G^2 

(Remember that, to maximise their own payoff, the second player must min- 
imize the first player’s payoff.) By noting that each player should play a best 
response to the other’s strategy, these two conditions can be combined 

7t((Ti,(T 2) = max 7 t((Ti, (T 2 ) 

(TiGSi 

= max min 7r(ai,(72) 

criGl^i (T2G^2 

or, equivalently 

7r(a*,a2) = min 7r(a*,a2) 

(T2G^2 

= min max 7r(cri,(T2) . 

(T2^^2 

Exercise 4.12 

“Ace-King-Queen” is a simple card game for two players, which is played 
as follows. The players each bet a stake of $5. Each player then chooses 
a card from the set {Ace, King, Queen} and places it face down on the 
table. The cards are turned over simultaneously, and the winner of the 
hand is decided by the following rules: an “Ace” (A) beats a “King” 
(K); a “King” beats a “Queen” (Q); and a “Queen” beats an “Ace”. The 
winning player takes the $10 in the pot. If both players choose the same 
card (both A, both K, or both Q), the game is drawn and the $5 stake 
is returned to each player. What is the unique Nash equilibrium for this 
game? 

Theorem 4.42 

A generic zero-sum game has a unique solution. 
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Proof 

See Von Neuman & Morgenstern (1953). □ 

Exercise 4.13 

Consider the game shown below. Show that if the game is generic (i.e., 
a ^ b, a c, etc.), then there is a unique Nash equilibrium. [Hint: there 
are sixteen possible cases to consider.] 





C 


D 


A 


a, —a 


b,-b 


B 


c, —c 


d, —d 



4.8 Games with n-players 

The extension of the theory to games with more than two players is straight- 
forward, if notationally baroque. Let us label the players by i € {1,2, ... ,n} 
Each player has a set of pure strategies Si and a corresponding set of mixed 
strategies Si. The payoff to player i depends on a list of strategies cti, (T 2 , • • • , cr„ 
- one for each player. For the definition of a Nash equilibrium, we will need to 
separate out the strategy for each of the players, so we denote by cr_i the list 
of strategies used by all the players except the z-th player. 

Example 4.43 

Consider a game with three players. The payoffs to each player can be written 
as: 

7ri(<Tl, Cr_l) = 7ri((Tl, ( 72 , (T 3 ) 

7T2((72, (7-2) = 7T2((7i, (72, (73) 

773(f73, (7_3) = 7T3((7 i, (72, ( 73 ). 

Suppose player z uses a mixed strategy Ui which specifies playing pure strat- 
egy s G Si with probability Pi(s). Payoffs for mixed strategies are then calcu- 
lated from the payoff table with entries 7Ti(si, . . . , s„) by 

7Ti(CTi,CT_i)= ^ ••• ^ Pl(si)---p„(s„)7Ti(si,...,S„) 

SiGSi Sfi^^n 
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P2 



A 


L 


R 


U 


1,1,0 


2,2,3 


D 


2,2,3 


3,3,0 



P2 



B 


L 


R 


U 


-1,-1, 2 


2,0,2 


D 


0,2,2 


1,1,2 



Figure 4.2 A representation of the three player game from example 4.45. 



Definition 4.44 

A Nash equilibrium in a n-player game is a list of mixed strategies ct*, uj j • ■ • ) 
such that 

a* G argmaxTTi (cTj, cr* j) Vi G {1,2,..., n} 

CTiGSi 



Example 4.45 



Consider a static three-player game where the first player chooses between U 
and D, the second player chooses between L and R, and the third player chooses 
between A and B. Instead of trying to draw a three-dimensional payoff table, 
we represent this game by a pair of payoff tables such as the ones shown in 
Figure 4.2. (We can interpret this as player 3 choosing the game that players 
1 and 2 have to play, so long as we remember that players 1 and 2 do not 
know which of the payoff tables player 3 has chosen.) We can find a Nash 
equilibrium for the game with the payoffs shown in Figure 4.2 as follows. First, 
suppose that player 3 chooses A. Then the best responses for players 1 and 2 
are the strategies a\ = = (0, 1). However, we do not have a Nash equilibrium 

because choosing A is not player 3’s best response to this pair of strategies. 
Now suppose that player 3 chooses B. Then the best responses for players 1 
and 2 are the strategies cti = (J 2 = (|, |). Because player 3 would get a payoff 
of I if he switches to A, we have a Nash equilibrium (cr{, cr^ , ctJ) with 



^* = 1^ if 
1 ^2’2^ 



2 ^2’ 2 ^ 



a; = (0,1) 



Exercise 4.14 

Represent the game from Example 4.45 by a pair of payoff tables “cho- 
sen” by player 2. Confirm that the game has the same Nash equilibrium 
when represented in this way. [Hint: show that there are no pure strategy 
Nash equilibria, then use the Equality of Payoffs theorem to find a Nash 
equilibrium involving mixed strategies.] 







5 

Finite Dynamic Games 



5.1 Game Trees 

So far we have considered static games in which decisions are assumed to be 
made simultaneously (or, at least, in ignorance of the choices made by the other 
players). However, there are many situations of interest in which decisions are 
made at various times with at least some of the earlier choices being public 
knowledge when the later decisions are being made. These games are called 
dynamic games because there is an explicit time-schedule that describes when 
players make their decisions. 

Dynamic games can be represented by a game tree - the so-called extensive 
form - which is an extension of the decision tree used in (single-person) decision 
theory. The times at which decisions are made are shown as small, filled circles. 
Leading away from these decision nodes is a branch for every action that could 
be taken at that node. When every decision has been made, one reaches the 
end of one path through the tree. At that point, the payoffs for following that 
path is written. We will use the convention that the first payoff in each pair is 
for the player who moves first. Time increases as one goes down the page, so 
the tree is drawn “upside-down”. 



Example 5.1 (Dinner Party Game) 

Two people (“husband” and “wife”) are buying items for a dinner party. The 
husband buys either fish {F) or meat (M) for the main course; the wife buys 
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Figure 5.1 The game tree for the dinner party game of Example 5.1. 

either red wine (i?) or white wine (W). Both people are rather conventional and 
prefer red wine with meat and white wine with fish, rather than either of the 
opposite combinations, which are equally displeasing. However, the husband 
prefers meat over fish, while the wife prefers fish over meat. We can represent 
these preferences as utility-based payoffs: 

7Th{M,R)=2 TTh{F,W) = l nh{F,R)=7Th{M,W) = 0 

tt^{M,R) = 1 ttUF,W) = 2 n^{F,R)=TTUM,W)=0 

where the payoffs for the husband have been given a subscript h and those for 
the wife a subscript w. So far the description of the game has been no different 
from that of a static game. Let us now assume that the husband buys the main 
course and tells his wife what was bought; his wife then buys some wine. The 
game tree for this game is shown in Figure 5.1. 



What is the solution of the dinner party game? The obvious way to solve 
this game is by backward induction (i.e., to work backwards through the game 
tree). Recall that the husband tells his wife whether fish or meat has been 
purchased. So when she makes her decision about the wine, she knows what 
main dish her husband will be cooking. If the husband has bought fish, then 
his wife will buy white wine (because this gets her a payoff = 2, rather than a 
payoff = 0 for having red wine with fish). On the other hand, if the husband 
has bought meat, then his wife will buy red wine (a payoff = 1 rather than a 
payoff = 0 for white wine with meat). So if the husband buys fish, then his 
wife will buy white wine and he will get a payoff = 1 . On the other hand, if the 
husband buys meat, then his wife will buy red wine and he will get a payoff 





5.2 Nash Equilibria 



91 



Pi 




Figure 5.2 Game tree for Exercise 5.1. 

= 2. So the husband prefers to buy meat and the dinner party will consist of 
meat and red wine (and, hopefully, some other items). 

Exercise 5.1 

Find a solution using backward induction for the game shown in Fig- 
ure 5.2. 



5.2 Nash Equilibria 

Is the solution we have just derived a Nash equilibrium? To answer this ques- 
tion, we have to determine what the strategies are for each player. The action 
sets for each player are = {M, F} and A„, = {R, W}. The set of pure strate- 
gies available to the husband is the same as his action set: = {M, F}. How- 

ever, the wife has four possible pure strategies: S„, = {RR, RW,WR,WW} 
where ^ 

RR = “i? if her husband chooses M and R if he chooses F” 

RW = “R if her husband chooses M and W if he chooses F” 

WR = “W if her husband chooses M and R if he chooses F” 

WW = “W if her husband chooses M and W if he chooses F” 

So in strategic (or normal) form the game has the following payoff table (with 
best responses underlined). 

^ The wife’s strategy set clearly illustrates the difference between actions and strate- 
gies - a distinction that cannot be made in static games. 
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Husband 



Wife 





RR 


RW 


WR 


WW 


M 


2,1 


2,1 


0,0 


0,0 


F 


0,0 


1,2 


0,0 


1,2 



Clearly there are three pure strategy Nash equilibria: (M, RR), (M, RW), and 
{F, WW). Any pair (M, ctJ) where crj assigns probability p to RR and 1 — p to 
RW is also a Nash equilibrium. So the solution we found by backward induction 
is a Nash equilibrium but there are many others. 

Although there are many Nash equilibria, not all are equally believable if 
we consider what they imply for the behaviour of one of the players. Consider 
the wife’s strategy WW. This corresponds to the wife telling her husband that 
if he buys meat she will, nevertheless, buy white wine. In response to this 
announcement her husband should buy the fish that his wife prefers because, 
if he does not, he will get one of his least preferred outcomes (white wine 
with meat). However, if her husband has bought meat, then the wife should 
buy red wine when she comes to her decision. This is because she prefers 
red wine compared to white wine when the dinner is based on meat. So the 
husband should not believe his wife when she threatens to buy white wine if he 
buys meat. In other words, the Nash equilibrium (F, WW) relies on the wife 
threatening to choose an option she would not take if she were faced with the 
decision of choosing a wine to go with meat. 

Consider, now, the wife’s strategies RR and crj. Both of these specify that 
the wife will buy red wine if her husband buys meat, which is alright. However, 
the first says that the wife would definitely buy red wine if her husband chose 
fish and the second that she may (if p > 0) buy red wine to go with fish. Neither 
of these is believable because the wife definitely prefers white wine with fish. 
(Note that neither of these strategies can be called a “threat” because the 
husband gets his most preferred outcome of meat and red wine in any case.) 

The Nash equilibria found from the strategic form don’t all seem to capture 
the essence of the dynamic game, because the order of the decisions is sup- 
pressed. Rather, a subset of the Nash equilibria - the ones found by backward 
induction on a game tree - seem more reasonable than the others when the 
time structure is taken into account. 

Exercise 5.2 

Consider the game tree shown in Figure 5.3. Solve this game by backward 
induction. Give the strategic form of the game and find all the pure 
strategy equilibria. 
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Pi 




Figure 5.3 Game tree for Exercise 5.2. 

5.3 Information Sets 

The difference between static games and dynamic games is not that the former 
can be represented in strategic form and the latter by a game tree. After all, we 
have just taken a dynamic game and represented it in strategic form in order 
to find all the Nash equilibria and, as we shall see, static games have a game 
tree representation too. 

The real distinction between static and dynamic games is what is known 
by the players when they make their decisions. In the dinner party game, the 
wife knew whether her husband had bought meat or fish when the time came 
for her to choose between red wine and white wine. We formally specify what 
is known to a player by giving their information set. 



Definition 5.2 

An information set for a player is a set of decision nodes in a game tree such 
that: 

1 . the player concerned (and no other) is making a decision; 

2. the player does not know which node has been reached (only that it is one 
of the nodes in the set). 

Note that the second part of this definition requires that a player must have 
the same choices at all nodes included in an information set. 
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Example 5.3 



Consider a static version of the dinner party game. Suppose that, although the 
husband chooses first, his wife does not know what main course ingredient has 
been bought when she is trying to choose a wine. A game tree for this game 
is shown in Figure 5.4(a) where the dotted line joining the wife’s two decision 
nodes represents the fact that she does not know which node she is at (i.e., 
both her decision nodes constitute an information set). The strategic form of 
this game has the payoff table shown below. 



Husband 



Wife 





R 


W 


M 


2,1 


0,0 


F 


0,0 


1,2 



Note that the wife only has two pure strategies in this case because she cannot 
condition her actions on her husband’s behaviour (because she doesn’t know 
it). Clearly, because the husband does not know what his wife will choose, 
she could choose first (and not tell him what wine she has bought) without 
changing the game. Therefore, a second possible game tree for this game is the 
one shown in Figure 5.4(b). 



Exercise 5.3 

Draw two different trees for the static game below. Can any solutions be 
found by backward induction? 



P2 





L 


M 


R 


u 


4,3 


2,7 


0,4 


D 


5,5 


5,-1 


-4,-2 



Exercise 5.4 

Draw the game tree for a version of the Prisoners’ Dilemma where one 
prisoner knows what the other has done. Is the outcome affected by the 
decisions being sequential rather than simultaneous? 
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Figure 5.4 Two possible game trees for the static version of the dinner party 
game of Exercise 5.3. The dotted lines between decision nodes indicate that 
those nodes belong to the same information set. 



5.4 Behavioural Strategies 

In general, a dynamic game a player may encounter an information set con- 
taining two or more decision nodes. At this point, the player does not have 
complete information about the behaviour of their opponent - the two players 
are making “simultaneous decisions”. The presence of such information sets 
means that we must allow for the possibility that players will randomise as 
they might do in a single-decision, static game. 
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Example 5.4 

Consider a game in which player 1 chooses between actions A and B. If ^ is 
chosen, then players 1 and 2 play a game of “matching pennies” . If player 1 
chooses B, then player 2 chooses L or B. The tree for this game is shown in 
Figure 5.5(a). Let us try to solve this game by an extended version of backward 
induction. Let a be the strategy “Play H with probability then (a, a) is the 
unique Nash equilibrium for matching pennies. If we assume that the players 
will indeed play the matching pennies game in this way, we can replace this 
part of the tree with the expected payoffs for the two players - in this case 
(0,0). We then have the truncated game tree shown in Figure 5.5(b). On the 
right-hand side of the tree player 2 should obviously choose i?, which leads to 
the truncated game tree shown in Figure 5.5(c). So, at the start of the game, 
if player 1 chooses A they get an expected payoff of zero. On the other hand, 
if they choose B, they get a payoff of -1. So the backward induction solution is 
that player 1 should use the strategy “A then ct” and player 2 should use “a if 
A, R if i?” . We can shorten this solution without ambiguity to {Aa, aR) . 

As we saw in Section 2.3, there are two ways of defining a randomising strat- 
egy. The randomising strategy we found in the previous example is known as 
a behavioural strategy. In a behavioural strategy, the opportunity for randomi- 
sation (by the appropriate player) occurs at each information set. In working 
backwards through the game tree we found a best response at each information 
set so the end result is an equilibrium in behavioural strategies. The alternative 
is known as a mixed strategy, which is formed by taking weighted combinations 
of pure strategies (see Section 4.2) 

a = ^^p(s)s with ^^p(s) = 1 . 

sGS sGS 

It is randomising strategies defined in the second way that appear in the def- 
inition of a Nash equilibrium. When we wish to distinguish between the two 
sorts of strategy, we will denote a behavioural strategy by the symbol [3. 

In Section 5.2, we saw that the equilibrium in behavioural strategies was 
equivalent to a Nash equilibrium of the strategic form game in a specific exam- 
ple. The next theorem shows that for any equilibrium in behavioural strategies 
there is a Nash equilibrium in mixed strategies that gives the same payoffs to 
both players. 

Theorem 5.5 

Let (/?*, P 2 ) be an equilibrium in behavioural strategies. Then there exist mixed 
strategies al and ctJ such that 
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Figure 5.5 Solution of a general dynamic game containing both sequential 
and simultaneous decisions by backward induction (a) Original game, (b) First 
truncation, (c) Second truncation. (See Example 5.4 for a description of the 
procedure.) 

(a) 7 Ti((T*, 172 )= TTi^Pl, for i = 1,2 and 

(b) the pair of strategies (ci, crj) is a Nash equilibrium. 



Proof 

(a) Consider one of the players. For any fixed strategy of the other player, it 
follows from Theorem 2.16 (by replacing “decision nodes” with “information 
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sets”) that every behavioural strategy has a mixed strategy representation.^ 
Because the two representations assign the same weight to each path through 
the game tree, it follows that 7Ti(CT*, = 7Ti(/3*, /JJ) for z = 1, 2. 

(b) Suppose that, although (/d*,/?^) is an equilibrium in behavioural strategies, 
(ctJ'jctJ) is not a Nash equilibrium. Then one of the players must have an al- 
ternative strategy that yields a higher payoff. Without loss of generality, we 
will assume this is player 1 and call this strategy a'l . Because this strategy has 
a different payoff against ctJ, its behavioural representation must be different 
from /?*. Let us call it /3^. Then 

7ri(/3i,/32) = 7ri(cri,cr2) 

> 7Ti(cri,cr2) 

= 

which contradicts the assumption that (/3*, /SJ) is an equilibrium in behavioural 
strategies. □ 

Now that we have shown that equilibria in behavioural strategies are equiv- 
alent to Nash equilibria, we can drop the distinction between behavioural and 
mixed strategies and denote an arbitrary strategy by a. 

Exercise 5.5 

Find the strategic form of the game from Example 5.4. Find mixed strate- 
gies ai and that give both players the same payoff they achieve by 
using the behavioural strategies found by backward induction. Show that 
the pair is a Nash equilibrium. 

Exercise 5.6 

A firm (the “Incumbent”) has a monopoly in a market worth £6 million. 
A second firm (the “Newcomer”) is thinking of entering this market. 
If the Newcomer does enter the market, the Incumbent can either do 
nothing or start a price war. The cost of a price war is £2 million to each 
firm. If the Newcomer enters then the two firms share the market equally. 
If the Newcomer does not enter then its next best option provides an 
income of £2 million, (a) Draw a game tree for this situation and find an 
equilibrium in behavioural strategies, (b) Construct the Strategic Form 
of this game and find all the Nash equilibria. 

^ We assume that the players have perfect recall - that is, they do not forget the 
decisions they have made in the past. This ensures that each player makes a unique 
sequence of decisions to arrive at any particular information set. 
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5.5 Subgame Perfection 

Using the extended form of backward induction to eliminate “unreasonable” 
Nash equilibria finds what are known as suhgame perfect Nash equilibria. In 
this section, we give a formal definition of subgame perfection. 



Definition 5.6 

A subgame is a part (sub-tree) of a game tree that satisfies the following con- 
ditions. 

1. It begins at a decision node (for any player). 

2. The information set containing the initial decision node contains no other 
decision nodes. That is, the player knows all the decisions that have been 
made up until that time. 

3. The sub-tree contains all the decision nodes that follow the initial node 
(and no others). 

Example 5.7 

In the sequential decision dinner party game of Figure 5.1, the subgames are 
(i) the parts of the game tree beginning at each of the wife’s decision nodes 
and (ii) the whole game tree. 

Example 5.8 

The only subgame of the “simultaneous” decision dinner party game (in either 
version of the game tree shown in Figure 5.4) is the whole game. 

Definition 5.9 

A subgame perfect Nash equilibrium is a Nash equilibrium in which the be- 
haviour specified in every subgame is a Nash equilibrium for the subgame. 
Note that this applies even to subgames that are not reached during a play of 
the game using the Nash equilibrium strategies. 



Example 5.10 

In the dinner party game of Example 5.1, the Nash equilibrium (M, RW) is a 
subgame perfect Nash equilibrium because (i) the wife’s decision in response 
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Figure 5.6 A dynamic game with multiple subgame perfect Nash equilibria. 
See Example 5.11 for a description of the solution. 

to a choice of meat is to choose red wine, which is a Nash equilibrium in that 
subgame; (ii) the wife’s decision in response to a choice of fish is to choose white 
wine (a Nash equilibrium in that subgame); and (ii) the husband’s decision is 
to choose meat, which (together with his wife’s strategy of RW , constitutes a 
Nash equilibrium in the entire game. However, the Nash equilibrium {F, WW) 
is not subgame perfect because it specifies a behaviour (choosing W) that is 
not a Nash equilibrium for the subgame beginning at the wife’s decision node 
following a choice of meat by her husband. 

It follows from the definition of a subgame perfect Nash equilibrium that 
any Nash equilibrium that is found by backward induction is subgame perfect. 
If a simultaneous decision subgame occurs, then all possible Nash equilibria of 
this subgame may appear in some subgame perfect Nash equilibrium for the 
whole game. 

Example 5.11 

Consider the game described by the game tree in Figure 5.6. The simultane- 
ous decision subgame has three Nash equilibria: {C,C), {D,D), and a mixed 
strategy equilibrium (ctJ'jctJ) giving each player a payoff of So the subgame 
perfect Nash equilibria are {AC, CL), {BD, DL), and (HctJjctJT). 
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Figure 5.7 Game tree for Exercise 5.7. 



Theorem 5.12 

Every finite dynamic game has a subgame perfect Nash equilibrium. 



Proof 

The result follows immediately from Definition 5.9 together with Nash’s theo- 
rem. □ 



Exercise 5. 7 

Find all the subgame perfect Nash equilibria for the game shown in 
Figure 5.7. 

Exercise 5.8 

Find all the subgame perfect Nash equilibria of the game shown in Fig- 
ure 5.8. 



5.6 Nash Equilibrium Refinements 

Subgame perfection is one of many proposed Nash equilibrium refinements. 
These attempt to supplement the definition of a Nash equilibrium with extra 
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Figure 5.8 Game tree for Exercise 5.8. 

conditions in order to reduce the number of equilibria to a more “reasonable” 
set. Ideally, the number of equilibria would be reduced to one, and that equi- 
librium would then be considered the solution of the game. There are two 
problems with this approach. First, the definition of “reasonable” varies ac- 
cording to the situation being modelled. Second, the number of equilibria that 
satisfy the refinement conditions is rarely just one: often several equilibria re- 
main - see Example 5.11. Moreover, while subgame perfect equilibria always 
exist, other types of refinement may lead to some games having no equilibria 
that satisfy the additional conditions. 

Subgame perfection tries to select particular equilibria as being more rea- 
sonable by moving backwards through the game tree. An alternative approach, 
called forward induction, moves forward through the tree. Let us look again at 
Example 5.11. The “problem” with subgame perfection in that game is that it 
does not provide a way to select between the three possible Nash equilibrium 
behaviours in the simultaneous decision subgame. However, if play has reached 
that subgame, player 2 could reasonably assume that player 1 will use C be- 
cause it is only the (C, C) equilibrium that will result in a payoff greater than 
2 (which player 1 could have received by using B at the beginning). Player 
1, realising that their opponent will reach this conclusion, is then confident of 
receiving a payoff of 3 for choosing A at the beginning and, therefore, chooses 
that action instead of B. Thus the equilibrium supported by forward induction 
is {AC, CL). 
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Figure 5.9 Game tree for Exercise 5.9. 

Exercise 5.9 

Consider the game shown in Figure 5.9. Find all the subgame perfect 
Nash equilibria. Which of these equilibria is supported by a forward 
induction argument? 

A problem that is common to both subgame perfection and forward induc- 
tion is that they assume the players will behave rationally (i.e., select Nash 
equilibrium behaviours that satisfy all the supplementary conditions that have 
been deemed reasonable) in parts of the game tree that would not be reached 
if the players act as prescribed by the equilibrium. But if those parts of the 
game tree will only be reached as a consequence of irrational behaviour by one 
or more players, why should we - or, indeed, the players themselves - assume 
that rational behaviour will reassert itself at that point? 

Example 5.13 

Consider the game shown in Figure 5.10. The unique subgame perfect Nash 
equilibrium is one in which both players will play L at every opportunity. 
Therefore, we (and the players) should expect player 1 to use L at the beginning 
of the game - at which point the game ends. Suppose that, contrary to this 
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Figure 5.10 A game that illustrates a problem with subgame perfection. 
The unique subgame perfect Nash equilibrium leads to player 1 using L at 
the beginning of the game. But what should player 2 do if player 1 behaves 
irrationally and gives them the opportunity to make a decision? 

expectation, player 1 uses R and consequently player 2 gets to make a decision. 
What should player 2 do? The backward induction argument, which is based 
on the assumption that player 1 will behave rationally in the future, would 
suggest that player 2 should use L at this point. But player 1 has already 
behaved irrationally once, so perhaps they will do so again. This argument 
suggests that player 2 may be better off choosing R, thus giving player 1 the 
opportunity to use R again. 

The problem is that we are attempting to analyse irrational behaviour on 
the basis that the players are rational. Although this appears to be a problem 
that is impossible to solve, there is one way to cut the Gordian knot. This is 
to assume that the players are perfectly rational in their intentions but that 
they make mistakes in the execution of those intentions. In other words, the 
unexpected behaviour is only apparently irrational. Think of a chess player 
about to move a piece. Two legal moves are available, one of which is better 
than the other. The player picks up the piece and moves it towards the better 
of the finishing positions. However, at the last minute, their hand trembles and 
they place the piece in the “wrong” square. Using this analogy, Nash equilibria 
that remain when the possibility of (small) mistakes is taken into account are 
called trembling hand Nash equilibria. Typically the probability of a mistake is 
characterised by a number e and the limit as £ — >■ 0 is taken. 

This characterisation of unexpected behaviour implies that a mistake by a 
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player at one time does not make it more likely that the same player will make 
another mistake in the future - the “trembles” are uncorrelated. This supports 
the subgame perfect Nash equilibrium in the game shown in Figure 5.10: player 
2 should use L if they get the opportunity, so long as the probability of making 
a mistake is small enough. 

Exercise 5.10 

Consider the game from Example 5.13. Let the probability that player 1 
makes a mistake be e. Find an s such that for all £ < e player 2 should 
use L if given the opportunity. 





6 

Games with Continuous Strategy Sets 



6.1 Infinite Strategy Sets 

For ease of exposition, most of this book is devoted to models in which players 
have discrete and finite strategy sets. However, several classic games describe 
situations in which the players do not choose actions from a discrete set; instead 
their pure strategy sets are subsets of the real line. In this chapter, we give a 
few examples to show how the concepts of game theory are easily extended to 
such cases. Economic models of a duopoly provide examples with pure-strategy 
Nash equilibria, and the so-called War of Attrition has an equilibrium involving 
mixed strategies. 

Suppose the pure strategy (action) sets are a subset of the real line [a, 6] . 
A pure strategy is then a choice x € [a, b] and a mixed strategy is defined by 
giving a function p{x) such that the probability that the choice lies between x 
and x+dx is p{x)dx. The existence of Nash equilibria for games with continuous 
pure-strategy sets was proved independently by Debreu, Glicksburg, and Fan 
in 1952 (see Myerson (1991) or Fudenberg & Tirole (1993) for details). 



6.2 The Cournot Duopoly Model 

A duopoly is a market in which two firms compete to supply the same set 
of customers with the same product. There are three classic duopoly models: 
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the Cournot duopoly, the Bertrand duopoly, and the Stackelberg duopoly - 
all named after their originators. In the Bertrand duopoly, the amount of the 
product consumed is determined by the price at which it is sold and the two 
firms have to decide simultaneously the price at which they will try to sell 
their stock. The firm that sets the lower price “captures the market” and the 
firm that sets the higher price sells nothing. A simple analysis of this situation 
using the elimination of dominated strategies shows that the firms should set a 
price that exactly matches their costs of production (otherwise the other firm 
could undercut their price). In the Cournot and Stackelberg duopoly models, 
the two firms have to decide how much of their product to manufacture, and 
the price at which the product is sold is determined by the total amount made. 
In the Cournot model, the decisions are made simultaneously while in the 
Stackelberg model the decisions are made sequentially with the decision of the 
first firm being public knowledge. Economists sometimes call the solutions of 
these three models a “Bertrand equilibrium”, a “Cournot equilibrium”, and a 
“Stackelberg equilibrium” . However, they are all just Nash equilibria of their 
respective models, with the equilibrium in the Stackelberg case being subgame 
perfect. 

Consider two firms competing for a market by making some infinitely divis- 
ible product, such as petroleum. Cournot’s model is based on allowing the firms 
to choose how much of the product they make, so the set of actions for each 
firm is a range of quantities qi which it could produce. Because the product is 
infinitely divisible, this action set is continuous. 

If Firm i produces an amount Qi of the product, the total amount produced 
is Q = + 92 - The market price of the product is assumed to depend on the 

total supply: 



P{Q) 




So the market price drops from a maximum of Pq when the product is very 
scarce to zero when a quantity Qo is is produced. The production costs are 
assumed to be C{qi) = cqi (i.e., there are no fixed costs and the cost of making 
a unit of the product is the same for each firm). The payoff for each firm is 
given by the profit that it makes in a market determined by the behaviour of 
both firms. The payoff to Firm i is, therefore. 



7i'*(gi,g2) = qiP{Q) - cqi. 



Notice that it certainly makes no sense for either firm to produce a quantity 
greater than Qq, because that would certainly lead to a loss rather than a 
profit. Consequently, we can restrict the action set to the range [0,Qo]- 

We begin by finding the best response for Firm 1 against every possible 
production quantity that Firm 2 could choose. The best response is to choose 
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a production quantity q\ that maximises the profit for Firm 1, given a value of 
<72. So we solve 

^(91,92) = 0 
oqi 



to find 



Qo 



Q 2 



91=^ l-7^-7T ■ 



Qo Pq, 

To check that this is really a best response (and not a “worst response”) we 
calculate 



a^TTi 



5^2-(9i.92) = -(tt 

< 0 . 



Qo 



We also need to confirm that qi + q 2 < Qoj so that the firms are making a 
non-negative profit: 



< 



Qo 




92 C 


2 


Qo -Po 


Qo 


92 


cQo 


2 


+ 2 


2Po 


Qo 


Qo 


cQo 


2 




2Po 


Qo 




— ) 

2Po/ 


Qo- 







92 



Similarly, we find the best response to a choice of <71 is for Firm 2 to produce 

. _ Qo A qi c \ 

2 V Qo Po) ■ 



A pure strategy Nash equilibrium is a pair ((71,(72), each of which is a best 
response to the other. Such a pair can be found by solving the simultaneous 
equations 



* 


Qo 


V Qo 


C 


9i = 


T' 


^0 


* 

92 = 


Qo 

2 


fi-#- 

V Qo 


c 

^0 



(?i = 92 = 



Qo 

3 




= q 



* 

c 



The solution is 
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where we have defined q* as the value of the quantity chosen at the equilibrium 
by each firm. At this equilibrium the payoff to each firm is 

MQcQc) = 



Let us compare this competitive equilibrium with the situation that holds 
under a monopoly. A monopolist maximises 

= qP{q) - cq 



7T2(gc.<?c) 
q:P{2q*) - cql 




and the optimal strategy for a monopolist is, therefore, 




Because q^ < 2q*, the price at which goods are sold is higher for the monopoly 
than it is for the two competing firms. So the model indicates that competition 
operates to benefit the consumer. 

Suppose, instead, the two firms in the duopoly could form a cartel and agree 
to use the strategies 

qi = q 2 = ^q*m ■ 

That is they each produce half of the optimum quantity for a monopolist. Then 
they would receive profits of 





l<l*mPiQ*J - \cq*„ 

qqPq 

8 I PoJ 



which are greater than the Cournot payoff, and the price paid by consumers 
would be the same as they would pay under a monopoly. However, such col- 
lusion is unstable, because the best response to a firm producing the cartel 
quantity is to produce 

q = 



> 



jC <i\ 

2 V 2Qo Po) 

3 * 

4 9m 
1 * 

2?m ■ 



Note that we have not proved that cartels are impossible, only that they will 
not occur in situations described by the Cournot model. We will return to this 
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question in Chapter 7 where we will discuss a model in which cartels can arise. 

Exercise 6.1 

Consider the asymmetric Cournot duopoly game where the marginal 
cost for Firm 1 is ci and the marginal cost for Firm 2 is C 2 - If 0 < Ci < 
^Po Vi, what is the Nash equilibrium? If ci < C 2 < Pq but 2 c 2 > Pq+ci, 
what is the Nash equilibrium? 

Exercise 6.2 

Consider the n-player Cournot game, n identical firms (i.e., identical 
costs) produce quantities qi,q 2 , . . . ,qn- The market price is given by 
P{Q) = Po(l — Q/Qo) where Q = Find the symmetric Nash 

equilibrium (i.e., q* = q* Vi). What happens to each firm’s profit as 
n — 1 oo? 

Exercise 6.3 

Two adjacent countries (labelled by i = {1, 2}) each have industries that 
emit pollution at a level tonnes per annum. Pollution from one country 
has a reduced effect on the other, so that the total level of pollution in 
country 1 is i?i = Ci + ke 2 (where 0 < fc < 1 ) and the total level of 
pollution in country 2 is i ?2 = ^2 + kei. Initially, each country produces 
an amount of pollution cq. However, the parliament in each country 
can vote to reduce the amount of pollution that it produces at a cost 
of c pounds per tonne per annum. The cost to the government-funded 
health service in each country increases with the total level of pollution 
as BoEf. Construct the payoffs Pi ( 61 , 62 ) for each of the countries and 
determine the equilibrium level of pollution produced in each country, 
assuming that the parliaments vote simultaneously. 



6.3 The Stackelberg Duopoly Model 

In the Stackelberg model, two firms (i = 1,2) are competing to sell a divisible 
product and must decide how much of it to produce, qi. As in the Cournot 
model, we assume that the market price for the product is given by 

where Q = qi + q 2 and that the cost of a unit of production for each firm is 
c. Unlike the Cournot duopoly model, decisions are made sequentially: Firm 
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1 (termed the “market leader”) decides on a quantity to produce and this 
decision is observed by Firm 2 (the “market follower”), which then decides on 
the quantity that it will produce. As usual, we assume that each firm wishes 
to maximise its profit and that Pq > c. 

We solve this game by backward induction to find a subgame perfect Nash 
equilibrium. We begin by finding the best response of Firm 2 , 92(91)) for every 
possible choice of production quantity by Firm 1 . Given that Firm 1 knows 
Firm 2 ’s best response to every choice of 91, we can find a Nash equilibrium 
for this game by determining the maximum payoff that Firm 1 can achieve 
given that Firm 2 will always use its best response to any particular choice of 
quantity by Firm 1 . 

Firm 2 ’s profit is 712(91, 92) = q2[P{Q) — c] and the best response to a choice 
of 9i is found by solving 

^(91,92) = 0 
092 



which gives 



Qo 






If Firm 1 chooses 91 and Firm 2 chooses the best response 92(91)) Firm I’s 
profit is 



7>'i(9i) 92(91)) 



9i 

9i 



9i + 92(91) \ 
Qo / 

^0 A _ 9i_ _ ^ 

2 V Qo 





So Firm 1 maximises its profit at 




The Nash equilibrium is, therefore. 



* 

* 

92 




c 



It is interesting to note that although Firm 2 has more information than 
Firm 1 - it knows Firm I’s decision, which has already been made, whereas 
Firm 1 does not know Firm 2 ’s decision which is still in the future - it is Firm 
1 that makes the greater profit (because ql > q^)- 
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Exercise 6.4 



Do consumers do better in the Cournot or in the Stackelberg model? 



The subgame perfect Nash equilibrium derived above is sometimes called 
the “Stackelberg equilibrium”. However, it is not the only Nash equilibrium in 
the Stackelberg model. Another Nash equilibrium is for Firm 1 to produce the 
Cournot quantity and for Firm 2 to produce the Cournot quantity regardless 
of the production of Firm \.li qi = q* then 






= q 



* 

C 



So Firm 2’s best response to q\ = q* is <72 = q*- If Firm 2 always chooses 
<Z2 = 9c then Firm I’s profit is 



7>'i(9i,9c) = 9i 



Po 




9i + 9* A 

Qo ) 




The best response (for Firm 1) to 92 = 9* is found from 



which gives 



diTi 

dqi 



(9i,9c*) = 0 



9i 



2 V Qo Po) 

q*c ■ 



So Firm I’s best response to 92 = 9* is 91 = 9*. Because, for both firms, the 
best response to the other firm producing quantity 9* is to produce the quantity 
9*. the pair of strategies (9*, 9*) is a Nash equilibrium. Although <72 = 9* is a 
best response to 91 = 9*, it is not a best response to 91 yf 9*- Consequently the 
Nash equilibrium (q*,q*) is not subgame perfect. 

Exercise 6.5 



Suppose a firm (the “Entrant”) is considering diversifying into a mar- 
ket that is currently monopolised by another firm (the “Incumbent”). 
Assuming that the market price for the product is given by 

rw) = a(i-|) 

where Q = qi + qs the cost of a unit of production for each firm is c 
and the cost to the Entrant of building manufacturing facilities is Ce, 
should the Entrant diversify? If the Entrant does diversify, should the 
incumbent reveal its production plans or keep them a secret? 
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6.4 War of Attrition 



In a War of Attrition, two players compete for a resource of value v. This 
could be two animals competing for ownership of a breeding territory or two 
supermarkets engaged in a price war. The strategy for each player is a choice 
of a persistence time, ti. The model makes three assumptions: 

1. The cost of the contest is related only to its duration. There are no other 
costs (e.g., risk of injury). 

2. The player that persists the longest gets all of the resource. If both players 
quit at the same time, then neither gets the resource. 

3. The cost paid by each player is proportional to the shortest persistence 
time chosen. (That is, no costs are incurred after one player quits and the 
contest ends.) 

Under these assumptions, the payoffs for the two players are 
and 

There are two pure strategy Nash equilibria. The first is 



if h > t2 
if ti < t2 

if t2 > ti 
if t2 <ti. 



tl = v/c and ^2 = 0 

giving 7Ti(v/c, 0) = v and 7r2(v/c, 0) = 0. This is a Nash equilibrium because, 
for player 1, 

7Ti(ti, 0) = V Vti > 0 

7Tl(0,0) = 0 

which gives 

For player 2, we have 

TT2{vfc,t2) = -Ct2<0 \/t2<v/c 

7T2(v/c,t2) = 0 Vt2>v/c. 



Hence 



T^2{tt,t2) < n2{tl,t2) Vt2 • 
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The second pure strategy Nash equilibrium is 



t* = 0 and ^2 = 



giving 7ri(0,u/c) = 0 and 7T2(0 ,u/c) = v. The conditions showing that this 
strategy pair is a Nash equilibrium are the same as the conditions for the first 
with players 1 and 2 swapped over. 

To find a mixed-strategy Nash equilibrium, it is convenient to consider 
strategies based on the costs of the contest, x = cti and y = ct 2 - In terms of 
costs, the payoffs are 






V — y if X > y 
—X if X < y 



for player 1, and 

T^2{x,y) = 

for player 2. A mixed strategy specifies a choice of cost in the range x to 
X + dx with probability p{x)dx; and (T2 specifies a similar probability density 
q{y). The expected payoff to player 1 if he chooses a fixed cost x against a 
mixed strategy crj is 



( V — X if y > X 

\-y ify<x 



TTi{x,a 2 )= (v-y)q{y)dy+ {-x)q{y)dy 
Jo Jx 

where the first term arises from the probability that player 2 chooses cost y < x, 
and the second term from the probability that player 2 chooses y > x. 

By extension of the Equality of Payoffs Theorem (Theorem 4.27) for ran- 
domising strategies, we must have TTi{x,a 2 ) = constant. That is, for fixed crj; 
7 Ti(x,(T 2) is independent of x so 




Now 

Using the fundamental theorem of calculus, the first term is 

{v - y)q{y)dy = {v - x)q{x) . 

Using the fundamental theorem of calculus and the fact that q{y) is a proba- 
bility density, we have 




dx 




q{y)dy 



A 

dx 



1- q{y)dy 
Jo 



-q{x) ■ 
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So, we have 



Stti 

dx 



pOO 

{v-x)q{x)-l q{y)dy + xq{x) 

J X 

poo 

- / q{y)dy 

J X 



V 

0 . 



From this we can identify q{y) as an exponential probability density, so we 
put q{y) = ke~^y where fc is a normalisation constant. Because 



pOO pOO 

/ q{y)dy =k e~’^ydy = 

J X j X 



we have vke — e 



q{y) = - exp (--) . 

V \ V/ 

In other words, the distribution of costs chosen under the mixed strategy is 
exponential with mean cost v. The same argument for the other player yields 
the same distribution of costs: 

p{x) = - exp 

V \ V / 



(i.e., the equilibrium is symmetric). 

Now that we have found a distribution in terms of costs chosen, we can 
easily find the Nash equilibrium in terms of the distribution of persistence 
times chosen. Using 

, . . .dx 

p[t) =p{x) — 

we have 

, . c 

p[t) = - exp 

V 

That is the distribution of times chosen is exponential with mean v/c. 

Although p{t) is the distribution of persistence times chosen by each player, 
it is not the distribution of contest durations. This distribution can be found 
as follows. 




^(duration <t) = 1 — P(contest is still going at time t) 

= 1 — P(neither player has quit before t) . 



Now 



P(Player i doesn’t quit before t) 






6.4 War of Attrition 



117 



Because the players’ decisions are independent, we have 



ct 



ct 



P(duration <t) = 1 — exp j exp ( 



i.e., contest durations are exponentially distributed with mean v/(2c). 



Exercise 6.6 

Show that the expected payoff when following this strategy is zero. 

Exercise 6.7 



In this section, we assumed that the cost of a contest was linearly re- 
lated to its duration. Find the mixed strategy equilibrium for a War of 
Attrition in which cost = kE. 
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7.1 Repeated Games 

Consider the following two (related) questions. In the Prisoners’ Dilemma, un- 
cooperative behaviour was the predicted outcome although cooperative be- 
haviour would lead to greater payoffs for all players if everyone was coopera- 
tive. Interpreting the Prisoners’ Dilemma as a generalised social interaction, we 
can ask the question: Is external (e.g., governmental) force required in order to 
sustain cooperation or can such behaviour be induced in a liberal, individually 
rational way? In the Cournot duopoly, cartels were not stable. However, in 
many countries, substantial effort is expended in making and enforcing anti- 
collusion laws. So it seems that, in reality, there is a risk of cartel formation. 
How can cartels be stable? 

A clue to a possible resolution of these problems lies in the response many 
people have to the original form of the Prisoners’ Dilemma: it is the fear of 
retaliation in the future that prevents each crook from squealing. In societies, 
individuals often interact many times during their lives, and the effect on the 
future seems to be an important consideration when any decision is made. 
In the business arena, firms make production decisions repeatedly rather than 
just once. So perhaps the cartel can be sustained by making promises or threats 
about what will be done in the future. 

Inspired by these observations, we will now consider situations in which 
players interact repeatedly. The payoffs obtained by players in the game will 
depend on past choices, either because they may condition their strategies on 
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the history of the interaction or because past choices have placed them into a 
different “state”. 

Let us consider first the case when there is only one state in which a partic- 
ular, single-decision game is played. This game is often called the stage game. 
After the stage game has been played, the players again find themselves facing 
the same situation (i.e., the stage game is repeated). Taken one stage at a time, 
the only sensible overall strategy is for a player to use their Nash equilibrium 
strategy for the stage game each time it is played. However, if the game is 
viewed as a whole, the strategy set becomes much richer. Players may condi- 
tion their behaviour on the past actions of their opponents or make threats 
about what they will do in the future if the course of the game does not follow 
a satisfactory path. 

We will restrict our consideration to stage games with a discrete and finite 
strategy set. This may seem like we are ruling out the possibility of discussing 
the stability of cartels. However, the restriction does not, in fact, prevent us 
from considering such questions. 

Exercise 7.1 



Consider the following finite version of the Cournot duopoly model. 
Marginal costs are the same for both firms, and the market price is 
determined by P{Q) = Po(l — QIQo) where Q = qi + (i.e., the sum 

of the production quantities chosen by the two firms). Each firm has a 
pure-strategy set {M, C}, where 

M: produce half the monopolist’s optimum quantity = ‘^(1 — c/Pq) 
C: produce the Cournot equilibrium quantity q* = ^(1 — c/Pq) 

Show that this game has the form of a Prisoners’ Dilemma. 



So, rather than analyse the discrete Cournot game with its complicated 
payoffs, we will look at a Prisoners’ Dilemma game with payoffs: 



P2 





C 


D 


c 


3,3 


0,5 


D 


5,0 


1,1 



The basis for our discussion of repeated games will be the “Iterated Prisoners’ 
Dilemma” in which this stage game will be repeated (iterated) some number 
of times. Initially, we will discuss games with a finite number of repeats. Then 
we will consider games with an “infinite” number of repeats, which can be 
interpreted as meaning that the players are uncertain about when the game 
will end (see Section 3.5). 
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7.2 The Iterated Prisoners’ Dilemma 

First, let us suppose that the Prisoners’ Dilemma is repeated just once so 
that there are 2 stages in all. We solve this just like any dynamic game by 
backward induction. In the final stage, there is no future interaction so the 
only consequences for any choice of strategy is the payoff to be gained in that 
stage. Because the best response is to play D regardless of the opponent’s 
strategy, {D, D) is the Nash equilibrium in this subgame giving a contribution 
of 1 to the total payoff for each player. 

Now consider the first stage. Note that this stage on its own is not a subgame 
- the subgame starting at the beginning of this stage is the whole game. Because 
the strategies have been fixed for the final stage, payoffs for the subgame can be 
calculated by adding the payoffs for the Nash equilibrium in the final stage (i.e., 
1 to each player) to the payoffs for the first stage to create a payoff table for the 
entire game. Note that the pure-strategy set for each player in the entire game 
is S = {CC, DC,CD, DD} but, because we are only interested in a subgame 
perfect Nash equilibrium, we only need to consider a subset of the payoff table. 



P2 





CD 


DD 


CD 


4,4 


1,6 


DD 


6,1 


2,2 



The Nash equilibrium in this game is {DD, DD). So the subgame perfect Nash 
equilibrium for the whole game is to play D in both stages. Note that a player 
cannot induce cooperation in the first stage by promising to cooperate in the 
second stage because they would not keep their promise and the other player 
knows this. Nor can they induce cooperation in the first stage by threatening 
to defect in the second stage, because this is what happens anyway. 

The argument from backward induction can easily be extended to any finite 
number of repeats, leading to the conclusion that the only solution is for both 
players to play D in every stage. In other words, bilateral cooperation (or a 
cartel) is not stable in the finitely repeated Prisoners’ Dilemma. 

Exercise 7.2 

Consider the repeated Prisoners’ Dilemma game with 2 stages using the 
full pure-strategy set S = {CC,DC,CD,DD}. Show that both players 
defecting in each stage is the unique Nash equilibrium. 

Now let us consider an infinite number of repeats, indexed by t = 0, 1, 2, . . .. 
If there is no end to the game (or the players don’t know when it will end), 
then there is no last stage to work backwards from. If the length of the game 
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is infinite, then at any stage there is still an infinite number of stages to go. 
This suggests we should look for a stationary strategy (because all subgames 
look the same). 

Definition 7.1 

A stationary strategy is one in which the rule for choosing an action is the same 
in every stage. Note that this does not necessarily mean that the action chosen 
in each stage will be the same. 

Example 7.2 

The strategies “Play C in every stage” and “Play D in every stage” are obvi- 
ously stationary strategies in the Iterated Prisoners’ Dilemma. The conditional 
strategy “Play C if the other player has never played D and play D otherwise” 
is also stationary. 

The payoff for a stationary strategy is the infinite sum of the payoffs 
achieved in each stage. Suppose that player i receives a payoff ri{t) in stage t. 
Then their total payoff is 

OO 

t=0 

Unfortunately, this straightforward approach leads to a problem. Consider the 
strategy sc =“Play C in every stage”. If both players use this strategy, the 
total payoff to either player is 

OO 

'^i{sc,sc) = 

= OO 

whereas, if one player uses the strategy sd =“Play D in every stage”, then the 
total payoff to the defector is 

7ri(sD,Sc) = 7r2(sc,SD) 

OO 

= 

= OO 

So it is impossible to decide (by comparing total payoffs) whether sd is better 
than Sc- 
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One solution is to discount future payoffs by a factor 6 with 0 < i5 < 1 so 
that a player’s total payoff is 

OO 

t=0 

Depending on the situation being modelled, the discount factor 6 represents 
inflation, uncertainty about whether the game will continue, or a combination 
of these. 

Example 7.3 

With the introduction of a discount factor, the payoff if both players always 
cooperate is 

OO 

Msc,sc) = 

t=0 

3 

and the payoff to a unilateral defector is 

7Ti(sd,Sc) = 7I'2(sC,Sd) 

OO 

= 

t=0 

5 

■ 

So all payoffs are finite. 

Now that we can sensibly compare payoffs achieved by different strategies, 
can permanent cooperation (a cartel) be stable outcome of the infinitely re- 
peated Prisoners’ Dilemma? An answer to this question is provided by the 
following example. 

Example 7.4 

Consider the trigger strategy^sc = “Start by cooperating and continue to co- 
operate until the other player defects, then defect forever after” (this strategy 
is sometimes given the name Grim). If both players adopt this strategy, then 

^ This strategy is called a trigger strategy because a change in behaviour is triggered 
by a single defection. 
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we would observe permanent cooperation and each player would achieve a total 
payoff 

sq) = 3 + 3(5 + 3(5^ -t“ * * * 

3 

■ 

Is (sgjSg) a Nash equilibrium? 

For simplicity, let us assume that both players are restricted to a pure- 
strategy set S = {sg, sc, s_d} (we will relax this constraint in the next section). 
Suppose player 1 decides to use the strategy sc (“always cooperate”) instead. 
Once again, we would observe permanent cooperation and the payoff to each 
player would be 



T’'i(sc,Sg) = 7I"2(sg,Sg) 

3 

■ 



The same result applies if player 2 decides to switch instead, so neither player 
can do better (against sq) by switching to sc- Now consider player 1 using 
the alternative strategy sd (“always defect”) against an opponent who uses 
the trigger strategy sq- Then the sequence of actions used by the players is as 
follows: 

t = 0 

Player 1 (sg): D 

Player 2 (sg): C 

The payoff for player 1 is 

7ri(sG,SG) = 



Player 1 cannot do better by switching to sg from sc if 



1 2 3 4 5 
D D D D D 
D D D D D 



5 + <5 + ^ . 



1 



3 



>5 + 



5 

■ 



(The same inequality arises if it is player 2 that switches.) This inequality is 
satisfied if 




So the pair of strategies (sg, sg) is a Nash equilibrium if the discounting factor 
is high enough. 
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Exercise 7.3 

Consider the Iterated Prisoners’ Dilemma with pure strategy sets Si = 
§2 = {s_D, SC) Sr, s^}. The strategy st is the famous Tit-For-Tat (“Be- 
gin by cooperating; then do whatever the other player did in the previous 
stage”), and s^ is a cautious version of Tit-for-Tat with which a player 
begins by defecting and then does whatever the other player did in the 
previous stage. What condition does the discount factor have to satisfy 
in order for (st, st) to be a Nash equilibrium? 

Exercise 7.4 

Consider the Iterated Prisoners’ Dilemma with pure-strategy sets Si = 
§2 = {s_D,sc,sc} (i-e., unconditional defection, unconditional coopera- 
tion, and the conditional cooperation strategy “Grim” ) . Write down the 
strategic form of the game and find all the Nash equilibria. 

Exercise 7.5 

Consider a game in which the stage game with the payoff table is given 
below is repeated an infinite number of times and payoffs are discounted 
by a factor i5 (0 < 5 < 1) that is common to both players. 





A 


B 


A 


1,2 


3,1 


B 


0,5 


2,3 



Assume that the players are limited to selecting pure strategies from the 
following 3 options. 

Syi: Play A in every stage game. 

SB- Play B in every stage game. 

Sc- Begin by playing B and continue to play B until your opponent plays 
A. Once your opponent has played A, play A forever afterwards. 

Find the condition on 6 such that (sc, sc) is a Nash equilibrium. 



7.3 Subgame Perfection 

The Nash equilibrium where both players adopt the trigger strategy sc is not 
a subgame perfect Nash equilibrium for the following reason. At any point in 






126 



7. Infinite Dynamic Games 



the game, the future of the game (i.e., a subgame) is formally equivalent to the 
entire game. The possible subgames can be divided into 4 classes: (i) neither 
player has played D\ (ii) both players have played H; (iii) player 1 used D in 
the last stage but player 2 did not; and (iv) player 2 used D in the last stage 
but player 1 did not. What does the Nash equilibrium strategy pair {sc,sc) 
specify as the strategies to be used in each of these subgame classes? 

In classes (i) and (ii) there is no conflict with the concept of subgame 
perfection. In class (i), neither player’s opponent has played D so the strategy 
sq specifles that cooperation should continue until the other player defects (i.e., 
sg again). That is the strategy pair specifled for class (i) subgames is (sg, sg)> 
which is a Nash equilibrium of the subgame because it is a Nash equilibrium 
of the entire game. In class (ii), both player’s opponents have defected so the 
Nash equilibrium strategy pair {sg, sg) specifles that each player should play D 
forever. That is, the strategy pair adopted in this class of subgame is (sd,sd) 
which is a Nash equilibrium of the subgame since it is a Nash equilibrium of 
the entire game. 

However, in class (iii), sg specifles that player 1 should switch to using D 
forever because his opponent has just played D. However, player 1 has not yet 
played D so player 2 should continue to use sg (which, indeed, results in the use 
of D from the next round onwards). Thus the Nash equilibrium for the whole 
game specifles that the strategy pair (sd,sg) should be adopted in subgames 
of class (iii). However, this pair is not a Nash equilibrium for the subgame 
because player 2 could obtain a greater payoff by using sd rather than sg- A 
similar argument applies to class (iv) subgames. Hence the Nash equilibrium 
for the entire game does not specify that players play a Nash equilibrium in 
every possible subgame, hence the Nash equilibrium (sg,sg) is not subgame 
perfect. 

Although, {sg, Sg) is not a subgame perfect Nash equilibrium, a very similar 
strategy does lead to a subgame perfect Nash equilibrium when it is adopted 
by both players. Let Sg = “Start by cooperating and continue to cooperate 
until either player defects, then defect forever after”. The pair {sg,Sg) is a 
subgame perfect Nash equilibrium because it specifles that the players should 
play {sd,sd) in the subgames of classes (iii) and (iv). 

Exercise 7.6 

Consider the Iterated Prisoners’ Dilemma with our usual set of payoffs. 
Show that both players using the strategy Tit-for-Tat (“Begin by coop- 
erating; then do whatever the other player did in the last stage”) is not 
a subgame perfect Nash equilibrium if the discount factor is <5 > |. 

So far we have allowed repeated games to have only a limited set of strate- 
gies. Is it possible to allow more general strategies? If we have found a Nash 
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equilibrium candidate from the limited strategy set, can we determine whether 
any other strategy will do better? For example, in the Iterated Prisoners’ 
Dilemma is the strategy sq still a Nash equilibrium strategy if more strategies 
are allowed? If we restrict ourselves to subgame perfect Nash equilibria, then 
an answer to these questions is provided by the “one-stage deviation principle” 
for repeated games. 

Definition 7.5 

A pair of strategies (cti, (T 2 ) satisfies the one-stage deviation condition if neither 
player can increase their payoff by deviating (unilaterally) from their strategy 
in any single stage and returning to the specified strategy thereafter.^ 



Example 7.6 

Consider the Iterated Prisoners’ Dilemma and the subgame perfect Nash equi- 
librium (sg,Sg) with Sg betug the strategy “Start by cooperating and continue 
to cooperate until either player defects, then defect forever after”. Does this 
pair of strategies satisfy the one-stage deviation condition? 

At any given stage, the game will be in one of two classes of subgame: 
either both players have always cooperated or at least one player has defected 
in a previous round. If both players have always cooperated, then Sg specifies 
cooperation in this stage. If either player changes to action D in this stage, 
then Sg specifies using D forever after. The expected future payoff for the 
player making this change is 

which is less than the payoff for continued cooperation if d > ^ (which is just 
the condition for (sg, Sg) to be a Nash equilibrium). If either player has defected 
in the past, then Sg specifies defection in this stage. If either player changes to 
action C in this stage, then Sg still specifies using D forever after. The expected 
future payoff for the player if they make this change is 



0 



1-.5 



which is less than the payoff for following the behaviour specified Sg provided 
5 < 1. Thus the pair (sg, Sg) satisfies the one-stage deviation condition provided 

i < (5 < 1. 



^ Compare this with the policy improvement algorithm for Markov decision processes 
in Section 3.7. 
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Theorem 7.7 (One-stage Deviation Principle) 

A pair of strategies is a subgame perfect Nash equilibrium for a discounted 
repeated game if and only if it satisfies the one-stage deviation condition. 

Proof 

For finitely repeated games, the equivalence of subgame perfection and the one- 
stage deviation condition is guaranteed by the backward induction method. 
(In fact, this shows that a subgame perfect Nash equilibrium payoff cannot 
be improved by deviating in any finite number of stages.) Because the defini- 
tion of subgame perfection implies the one-stage deviation condition for both 
finitely and infinitely repeated games, it only remains to prove that the one- 
stage deviation condition implies that a pair constitutes a subgame perfect 
Nash equilibrium in an infinitely repeated game. 

Suppose that, contrary to the statement of the theorem, a strategy pair 
(cti, (72) satisfies the one-stage deviation condition but is not a subgame perfect 
Nash equilibrium. It follows that there is some stage t at which it would be 
better for one of the players, say player 1, to adopt a different strategy di. That 
is, there is an e such that 



7Ti(di,cr2) - 7 Ti(cti,CT2) > 2e. 

Now consider a strategy that is the same as di from stage t up to stage 
T and is the same as cti from stage T onwards. Because future payoffs are 
discounted 

kl(di,CT 2 ) - 7ri(cr(,(T2)| OC 

so we can choose a T such that 

7ri((Ti,(J2) - 7ri(cr(,(T2) < £. 

Combining the two inequalities, we get 

7I'i(ct(,(J 2) - 7n(cri,(T2) > £T. 

But (Ti and cr( differ at only a finite number of stages, so this inequality con- 
tradicts the one-stage deviation principle for finitely repeated games. It follows 
that a strategy pair cannot satisfy the one-stage deviation condition without 
also being a subgame perfect Nash equilibrium. □ 

Exercise 7.7 

Consider an Iterated Prisoners’ Dilemma with the following payoffs for 
the stage game. 
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Let sp be the strategy (sometimes called Pavlov) “defect if only one 
player defected in the previous stage (regardless of which player it was) ; 
cooperate if either both players cooperated or both players defected in 
the previous stage” . Use the one-stage deviation principle to find a con- 
dition for (sp,sp) to be a subgame perfect Nash equilibrium. 



7.4 Folk Theorems 

The Folk Theorem was given that name because the result was widely known 
long before anyone published a formal proof. Since the original result, there 
have been many variants each of which proves a slightly different result based 
on slightly different assumptions. However, the general flavour of the result is 
always the same: if the Nash equilibrium in a static game is socially sub-optimal, 
players can always do better if the game is repeated and the discount factor 
is high enough. These theorems are often also called “folk theorems” despite 
having a well-attested origin. We have just seen an example of a folk theorem in 
action in the previous section. In the Prisoners’ Dilemma, the Nash equilibrium 
gives each player a poor payoff of 1 compared to the socially optimal payoff 
of 3. This higher payoff can be achieved (in each stage) by both players as an 
equilibrium of the repeated game if the discount factor is large enough. 

In order to be a bit more specific, we will consider a folk theorem that was 
proved by Friedman in 1971. To do this, we need the following definitions. 



Definition 7.8 

Feasible payoff pairs are pairs of payoffs that can be generated by strategies 
available to the players. 



Definition 7.9 

Suppose we have a repeated game with discount factor 5. If we interpret 6 as 
the probability that the game continues, then the expected number of stages 
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in which the game is played is 



T = 



1 

l-,5 



Suppose the two players adopt (not necessarily Nash equilibrium) strategies CTi 
and (72, then the total expected payoff to player i is TTi{ai, (T 2 ) and the average 
payoffs (per stage) are given by 

^7rj((Ti,Cr2) = (1 - <5)7rj((Ti,Cr2). 



Remark 7.10 

The range of feasible payoff pairs in a static game and the range of feasible 
average payoff pairs if that game is repeated are the same. 

Definition 7.11 

Individually rational payoff pairs are those average payoffs that exceed the stage 
game Nash equilibrium payoff for both players. 



Example 7.12 

In the static Prisoners’ Dilemma, pairs of payoffs (tti, 712 ) equal to (1, 1), (0, 5), 
(5, 0), and (3,3) are obviously feasible since they are generated by combinations 
of pure strategies. However, although each player could get a payoff as low as 
0, the payoff pair (0, 0) is not feasible since there is no strategy pair which 
generates those payoffs for the two players. If player 1 and player 2 use strategy 
C with probabilities p and q, respectively, the payoffs are given by 

(7ri,7T2) = (I - p + 4q - pq,l - q + 4p - pq). 

Feasible payoff pairs are found by letting p and q take all values between 0 
and 1. Individually rational payoff pairs are those for which the payoff to each 
player is not less than the Nash equilibrium payoff of 1. See Figure 7.1. 



Theorem 7.13 (Folk Theorem) 

Let ( 77 ^ 772 ) be a pair of Nash equilibrium payoffs for a stage game and let 
(vi,V 2 ) be a feasible payoff pair when the stage game is repeated. For every 
individually rational pair ( 771 ,^ 2 ) (be., a pair such that Vi > ttJ and V 2 > 
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Figure 7.1 Feasible payoffs for the Prisoners’ Dilemma lie within the quadri- 
lateral with vertices (1,1), (0,5), (3,3), and (5,0). Feasible average per-stage 
payoffs for the Iterated Prisoners’ Dilemma lie within the same quadrilateral. 
Individually rational payoffs for the Iterated Prisoners’ Dilemma lie in the 
shaded area. 

ttJ), there exists a 5 such that for all <5 > 5 there is a subgame perfect Nash 
equilibrium with payoffs (ui,U 2 )- 



Proof 

Let (ct*,(T 2 ) be the Nash equilibrium that yields the payoff pair (7r*,7r2). Now 
suppose that the payoff pair (wi, U 2 ) is produced by the players using the actions 
a\ and 02 in every stage (we will consider shortly what happens when this 
assumption is not valid). Consider the following trigger strategy 

“Begin by agreeing to use action a^; continue to use as long as both 
players use the agreed actions; if any player uses an action other than 
Oi, then use crj for ever afterwards.” 

By construction any Nash equilibrium involving these strategies will be sub- 
game perfect, so we only need to find the conditions for a Nash equilibrium. 
Consider another action a'l such that the payoff in the stage game for player 
1 is 7Ti ( 0 ^, 02 ) > Vi- Then the total payoff for switching to a'^ against a player 
using the trigger strategy is not greater than 

7Ti(a(,a2) • 
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It is, therefore, not beneficial to switch to a'^ if <5 > where 

^ _ 7ri(a'i,a2) - 
^ 7Tl(a'i,a2) - 7T^ ' 

By assumption 711(0^,02) > > tt*, so 1 > > 0. A similar argument 

for player 2 leads to a minimum discount factor 82 for player 2. Taking 8 = 
max(8i,82) completes this part of the proof. 

Now we suppose that the payoffs Vi are achieved by using randomising 
strategies at. Assume that there exists a randomising device whose output is 
observed by both players. Assume also that there is an agreed rule for turning 
the output of the randomising device into a choice of action for each player.^ 
These assumptions mean that the strategies themselves (and not just the ac- 
tions that happen to be taken) are observable. If the strategies are observable 
in this way, then the previous argument may be repeated with actions and 
a' being replaced by strategies Ui and cr'. □ 



7.5 Stochastic Games 

A stochastic game is defined by a set of states X with a stage game defined for 
each state. In each state x, player i can choose actions from a set Ai(x). One 

of these stage games is played at each of the discrete times t = 0, 1, 2, The 

choice of actions taken by the players in a particular state determines both the 
immediate rewards obtained by the players and the probability of arriving in 
any other given state at the next decision point. That is, given that the players 
are in state x and choose actions ai € Ai(x) and 02 G A2(x), the players 
receive immediate rewards ri(x, 01,02) and r2(x, 01,02) and the probability 
that they find themselves in state x' for the next decision is p{x'\x, oi, 02). 



Definition 7.14 

A strategy is called a Markov strategy if the behaviour of a player at time t 
depends only on the state x. A pure Markov strategy specifies an action a(x) 
for each state x € X. 

In this section, we will make the following simplifying assumptions. 

® For example, if player 1 has a choice of three actions a, b, and c and is required to 
choose according to p{a) = | , p{b) = | , and p{c) = | . Then the players may agree 
that a normal die should be thrown and that player 1 should choose a if the score 
is 1, b if the score is 2 or 3, and c if the score is 4 or more. 
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1. The length of the game is not known to the players (i.e., the horizon is 

infinite) . 

2. The rewards and transition probabilities are time-independent. 

3. The strategies of interest are Markov. 

If the number of states is small and the number of actions available in each 
state is small, then a stochastic game has a simple diagrammatic representation. 
This diagrammatic representation is best introduced by means of an example. 
(Compare the description of Markov decision processes in Chapter 3.) 

Example 7.15 

The set of states is X = {x, z}. In state x, both players can choose an action 
from the sets A(x) = A(y) = {a, b}. The immediate rewards for player 1 for 
the game in state x are ri(x,a, a) = 4, ri{x,a,b) = 5, ri{x,b,a) = 3, and 
Ti{x,b,b) = 2. This is a zero-sum game so r 2 (a;, 01 , 02 ) = — ri(a;, Oi, 02 ) for all 
action pairs. If players choose the action pair [o, b] in state x, then they move to 
state 2 with probability | and remain in state x with probability If any other 
action pair is chosen, the players remain in state x with probability 1. If the 
players are in state z, then they have the single choice set A(z) = {6} and the 
immediate rewards ri(z,&, 6) = r 2 {z,b,b) = 0. Once the players have reached 
state z, they remain there with probability 1 (so z is a zero-payoff absorbing 
state) . This lengthy description can be presented much more concisely by means 
of the diagram shown in Figure 7.2. 

Consider a game in state x at time t. If we knew the Nash equilibrium 
strategies for both players from time t + 1 onwards, we could calculate the 
expected future payoffs each player would receive from time t -I- 1 onwards 
given that they are starting in a particular state. Let us denote the expected 
future payoff for player i starting in state x by 7r*(a;) (with the * indicating 
that these payoffs are derived using the Nash equilibrium strategies for both 
players). At time t, the players would then be playing a single-decision game 
with payoffs given by 

7Ti(ai,a2)= I ri(x,ai,a2) -f (5 p{x'\x,ai,a2)Trl{x') 

V x'GX 

where we have assumed that future payoffs are discounted by a factor S for 
each time step. We will call this game the effective game in state x. 

For a Markov strategy, the expected future payoffs in state x are indepen- 
dent of time. Therefore, the payoffs for a Markov-strategy Nash equilibrium 
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State — X 



State — z 



Figure 7.2 The payoffs and state transitions for the stochastic game described 
in Example 7.15 and solved in Example 7.16 with a discount factor <5 = |. In 
state X, the players play a zero-sum game. State z is a zero-payoff absorbing 
state. 

are given by the joint solutions of the following pairs of equations (one for each 
state X gX.). 

n*i{x) 

TT^(x) 

Unfortunately, there is no straightforward and infallible method for solving 
these equations. Nevertheless, a solution can often be found relatively easily, 
as shown by the following example. 



= max ri(a:, oi, Oo) + (5 / p(a;'|a;, oi, OoIttUx') 

= max r 2 (a;, a*, 02 ) -f 5 > p{x'\x,al,a 2 )TT 2 {x') ] 



Example 7.16 

Consider the stochastic game with the state transitions and payoffs given in 
Figure 7.2 and discount factor <5 = |. The value of being in state z is zero for 
both players. Let v be the present value"* for player 1 of being in state x (the 
value for player 2 is —v because this is a zero-sum game). This means that in 
state X, the players are facing the following effective game. 

This value is the expected total future payoff. 



4 
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Clearly (&, a) is not a Nash equilibrium for any value of v. Ignoring marginal 
cases, the Nash equilibrium for the effective game in state x will be 

1. (a, a) if V < 3 

2. (b, b) if V > 9 

3. (a, 6) if 3 < V < 9 

Suppose that the players choose (a, a), then u = 4 + => v = 12, which is 

inconsistent with the requirement v < 3. Now suppose the players choose (6, b), 
then V = 2 + |t; v = 6, which is inconsistent with the requirement w > 9. 
Finally, suppose that the players choose (0,6), then v = 5 + v = ^, 

which is consistent with the requirement 3 < u < 9. So the unique Markov- 
strategy Nash equilibrium has the players using the pair of actions (o, b) in 
state X. 

Exercise 7.8 

Construct a two-state stochastic game for an Iterated Prisoners’ Dilemma 
problem in which the subgame perfect strategy Sg (“start by cooperat- 
ing and continue to cooperate until either player defects, then defect 
forever after” ) can be represented as a Markov strategy. Show that both 
players using this strategy is a Markov-strategy Nash equilibrium for the 
stochastic game if 5 > 
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Population Games 



8.1 Evolutionary Game Theory 

So far we have considered two-player games in the framework of Classical Game 
Theory, where the outcome depends on the choices made by rational and con- 
sciously reasoning individuals. The solution for this type of game (the Nash 
equilibrium) was based on the idea that each player uses a strategy that is 
a best response to the strategy chosen by the other, so neither would change 
what they were doing. For symmetric Nash equilibria, (a*, a*), we can give 
an alternative interpretation of the Nash equilibrium by placing the game in 
a population context. In a population where everyone uses strategy cr*, the 
best thing to do is follow the crowd; so if the population starts with every- 
one using a*, then it will remain that way - the population is in equilibrium. 
Nash himself introduced this view, calling it the “mass action interpretation”. 
A natural question to ask is then: What happens if the population is close to, 
but not actually at, its equilibrium configuration? Does the population tend to 
evolve towards the equilibrium or does it move away? This question can be in- 
vestigated using Evolutionary Game Theory, which was invented for biological 
models but has now been adopted by some economists. 

Evolutionary Game Theory considers a population of decision makers. In 
the population, the frequency with which a particular decision is made can 
change over time in response to the decisions made by all individuals in the 
population (i.e., the population evolves). In the biological interpretation of 
this evolution, a population consists of animals each of which are genetically 
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programmed to use some strategy that is inherited by its offspring^. Initially, 
the population may consist of animals using different strategies. The payoff 
to an individual adopting a strategy cr is identified with the fitness (expected 
number of offspring) for that type in the current population. Animals with 
higher fitness leave more offspring (by definition) so in the next generation the 
composition of the population will change. In the economic interpretation, the 
population changes because people play the game many times and consciously 
switch strategies. People are likely to switch to those strategies that give better 
payoffs and away from those that give poor payoffs. 



8.2 Evolutionarily Stable Strategies 

As with any dynamical system, one interesting question is: What are the end- 
points (if there are any) of the evolution? One type of evolutionary end-point 
is called an evolutionarily stable strategy (ESS). 

Definition 8.1 

Consider an infinite population of individuals that can use some set of pure 
strategies, S. A population profile is a vector x that gives a probability x{s) 
with which each strategy s G S is played in the population. 

A population profile need not correspond to a strategy adopted by any 
member of the population. 



Example 8.2 

Consider a population of individuals that can use two strategies si and S2- If 
every member of the population randomises by playing each of the two pure 
strategies with probability then the population profile is x = (5,5). In 
this case, the population profile is identical to the mixed strategy adopted 
by all population members. On the other hand, if half the population adopt 
the strategy si and the other half adopt the strategy S2, then the population 
profile is again x = (5,5), which is not the same as the strategy adopted by 
any member of the population. 

^ This phenotypic gambit was discussed in Section 1.4. 
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Exercise 8.1 

(a) Give three ways in which a population with profile x = might 

arise, (b) Consider an strategy set S = {si, 52 , 53 }- If the population 
consists of 40% of individuals using the strategy ( 5 , 0 , 5 ) and 60% using 
(1,1,0), what is the population profile? 

Consider a particular individual in a population with profile x. If that indi- 
vidual uses a strategy a, then the payoff to that individual is denoted 7r(cr, x). 
(Note that the other “player” is actually the population and does not have a 
payoff.) The payoff for this strategy is calculated by 

= ^p(5)7t(5,x) . 
ses 

These payoffs represent the number of descendants (either through breeding 
or through imitation) that each type of individual has. Therefore, the payoffs 
determine the evolution of the population. 



Example 8.3 

Consider a population of N animals in which individuals are programmed to 
use one of two strategies 5i and 52 . Suppose that 50% of the animals use each 
of the strategies, i.e., x = ( 5 , 5 ) and that, for this current population profile, 

7t(5i,x)=6 and 7t(s2,x)=4. 

In the next generation, there will be 6N/2 individuals using si and 4A^/2 indi- 
viduals using 52 , so the new population profile will be x = (0.6, 0.4). 

In order to proceed with the next generation we need to determine how the 
payoffs change when the population profile alters: that is, we need to know how 
7t(5,x) behaves as a function of x. Mathematically, the distinction is whether 
the payoff is a linear or non-linear function of the various probabilities x(s). 
From a modelling viewpoint, we distinguish between two types of population 
game: games against the field and pairwise contests. 

Definition 8.4 

A game against the field is one in which there is no specific “opponent” for a 
given individual - their payoff depends on what everyone in the population is 
doing. 
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Games against the field are quite different from the games considered by 
Classical Game Theory: one consequence of the population-wide interaction is 
that the payoff to the given individual is not (necessarily) linear in the proba- 
bilities x{s) with which the pure strategies are played by population members. 



Definition 8.5 

A pairwise contest game describes a situation in which a given individual plays 
against an opponent that has been randomly selected (by Nature) from the 
population and the payoff depends just on what both individuals do. 

Pairwise contests are much more like games from Classical Game Theory 
in that we can write 



7t(ct, x) = ^ ^ p{s)x{s')tt{s, s') 

seS s'GS 

for suitably defined pairwise payoffs 7r(s, s'). 

Sometimes games against the field are referred to as “frequency-dependent 
selection”, and the word “game” is reserved for pairwise contests where there 
is an identifiable interaction between two individuals. However, general popu- 
lation games may include interactions of both types, so we will refer to both 
of them as “games” . This will also help us to maintain the distinction between 
Classical and Evolutionary Game Theory, which is often obscured when only 
pairwise contests are considered. 

We are interested in the end points of the evolution of the population. In 
other words, we wish to find the conditions under which the population is stable. 
Let X* be the profile generated by a population of individuals who all adopt 
strategy a* (i.e., x* = a*). A necessary condition for evolutionary stability is 

a* G argmax7r(cr, X*) . 

So, at an equilibrium, the strategy adopted by individuals must be a best 
response to the population profile that it generates. Furthermore, we have the 
population equivalent of Theorem 4.27. 



Theorem 8.6 

Let a* be a strategy that generates a population profile x*. Let S* be the 
support of a*. If the population is stable, then 7r(s,x*) = 7r(<T*,x*) Vs G S*. 
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Proof 

If the set S* contains only one strategy, then the theorem is trivially true. 
Suppose now that the set S* contains more than one strategy. If the theorem 
is not true, then at least one strategy gives a higher payoff than 7 t(ct*,x*). Let 
s' be the action that gives the greatest such payoff. Then 

7r(cr*,x*) = ^ p*(s)7t(s,x*) 

sGS* 

= '^P*{s)Tr{s,yi*) + p*{s')tt{s',x*) 

s^s' 

< '^P*{s)tt{s',x*) +p*{s')tt{s',x*) 

s^s' 

= 7t(s',X*) 

which contradicts the original assumption that the population is stable. □ 

If a* is a unique best response to x*, then the evolution of the population 
clearly stops. However, if there is some other strategy that does equally well in 
the population with profile x* , then the population could drift in the direction 
of the other strategy and its corresponding population profile - unless it is 
prevented from doing so. 

Definition 8.7 

Consider a population where (initially) all the individuals adopt some strat- 
egy a*. Now suppose a (genetic) mutation occurs and a small proportion e of 
individuals use some other strategy cr. The new population (i.e., after the ap- 
pearance of the mutants) is called the post- entry population and will be denoted 
by X£. 

Example 8.8 

Consider a population in which S = {si,S 2 } and a* = (|, |). Suppose the 
mutant strategy is <t = (|, j). Then^ 

Xg = (1 — £)cr* -I- ecr 

lx /3 

- ( 1 -£)( 2 > 2 ) + ^( 4 ’ 4 ^ 

^ There is a slight abuse of notation here because strategies and population profiles 
are different objects. What we mean is that the components of the two vectors are 
equal, i.e., x(s) = p(s), Vs G S. 
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1 el e 

^2 4’ 2 ~ 4 



Definition 8.9 

A mixed strategy a* is an ESS if there exists an e such that for every 0 < e < e 
and every u ^ a* 

7r(cr*,Xe) > 7r(CT,Xe) . 

In other words, a strategy a* is an ESS if mutants that adopt any other 
strategy cr leave fewer offspring in the post-entry population, provided the 
proportion of mutants is sufficiently small. In the next two sections, we consider 
the application of this definition - first in a game against the field and then a 
pairwise contest. 



8.3 Games Against the Field 

Have you ever wondered why the ratio of males to females in (most) human 
(and other animal) populations is 50:50? One way of phrasing the answer is 
because that ratio is an ESS. 

Example 8.10 

Consider game defined by the following conditions. 

1. The proportion of males in the population is /i and the proportion of females 
is 1 — /r. 

2. Each female mates once and produces n offspring. 

3. Males mate (1 — /r)//i times, on average. 

4. Only females “make decisions”^. 

For simplicity, assume that the females’ available pure strategies are either to 
produce no female offspring (si) or to produce no male offspring ( 52 )- With 
this strategy set, a general strategy cr = (p, 1 — p) produces a proportion p of 
male offspring. A population profile x = (x, 1 — x) produces a sex ratio p = x, 
so we can write the population profile naturally in terms of the sex ratio as 

X = (p, 1 - p). 

® That is, only female genes affect the sex ratio of offspring, so Natural Selection 
acts only on females. 
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Because the number of offspring is fixed at n, that clearly cannot be used 
as the payoff (fitness) for a strategy. However, the number of grandchildren 
does vary, so we will use that as the payoff. So, in a population with profile 
X = (/i, 1 — /i), the payoffs are 



f \ 2 ^ 

7r(si,x) = n 


(8.1) 


7t(s2,x) = 


(8.2) 



(n female children each produce n grandchildren for the female, and n male 
children each get (1 — matings and produce n grandchildren from each 
mating). The fitness of a mixed strategy a = {p,l — p) is, therefore. 



7r(cr, x) = ( (1 — p) + p 



1 M 



Because n is independent of the strategy chosen, we can set n = 1 for ease of 
calculation (we are, after all, interested in the sex ratio). 

At this point, it might be tempting to construct a payoff table for the game, 
such as 



Population 





X = 1 


X = 0 


Si 


7t(si,X = 1) 


7t(si, x = 0) 


S2 


7t(s2,X = 1) 


7t(s2, X = 0) 



or, even 



Female 



Population 





Si 


S2 


Si 


7t(si,Si) 


7t(si,S2) 


S2 


7r(s2,si) 


7t(s2,S2) 



However, we should not do this for two reasons. First, the profile (xq = (0, 1)) 
leads to /i = 0, which means the payoff for si is undefined. Second, it might 
tempt us to believe that the pure-strategy payoffs in a general population are 

7r(si,x) = XTT{Si, Si) + (1 - x)Tr{si, S 2 ) 

which they are not: in Equation 8.1, the payoff to the strategy Si is a non-linear 
function of the population profile. 

The first of these problems is an affliction of the simple way we have set up 
the basic model. However, because the evolutionarily stable population will turn 
out to be well away from this state, we can ignore this problem and continue to 
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use the simple formulation. An alternative approach is to specify pure strategies 
that always produce a non-zero proportion of males - see Exercise 8.3. 

Let us now try to find an ESS. Consider the following three cases. 

1. If /i < ^, then females using si (all male offspring) have more grandchildren, 
which (eventually) causes fj, to rise. So, si is not an ESS. 

2. If /i > ^, then females using S 2 (all female offspring) have more grandchil- 
dren, which causes n to fall. So, S 2 is not an ESS. 

3. CT* = (i, i) is a potential ESS, because by Theorem 8.6 

7t(si,X*) = 7t(s2,X*) = 7t((T*,X*) (8.3) 



if the population profile is x* = (|, i) (i.e. n = ^). 

Because Equation 8.3 is a necessary but not sufficient condition for evolu- 
tionary stability, we need to check that a* = (^, ^) is, in fact, an ESS. Let 
a = {p,l — p) then 

Xe = (1 — e)a* + ea 



and 



Me = ^(1 -£) + \ 



The ESS condition is 



7r(CT*,Xe) > 7r(cr,Xe) 



1 

2 



)■ 



where 



and 



1 - Me 



7r(cr,X£) = {l-p)+p 
The difference between the payoffs is 



Me 



7r(cr*,Xe) -7r(CT,Xe) = {p-^) + {^-p) 



1 - Me 
Me 



= 

= 



1 - Me 
Me 

1-2/re 



- 1 



Me 



If this difference is positive for any a = {p,l — p) with p yf | then a* is an 
ESS. Because 



p < 



1 

2 



1 

Me < 2 

7r(cr*,Xe) > 7r(cr,Xe) 
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and 

1 

Me > 2 

7r(cr*,Xe) > 7r((T,Xe) 

the mixed strategy cr* = (|, |) is an ESS. 

Note that we have shown that a monomorphic (“one form”) population 
where everyone uses the strategy cr* is evolutionarily stable, with profile x*. 
However, in this population individuals using si and individuals using S 2 have 
the same fitness as individuals using cr*. So is a polymorphic (“many form”) 
population in which, for example, 50% of animals use si and 50% use S2 also 
stable? This polymorphic population also generates a profile x*, but neither of 
these strategies is an ESS on its own and the ESS formalism cannot deal with 
polymorphisms - we will address this question in the next chapter. 

Exercise 8.2 

Consider a simplified version of the Internet. There are two operating 
systems available to computer users: L and W. A user of system W 
has a basic utility of 1, but L is a better operating system so a user 
of L has a basic utility of 2. If two computers have the same operating 
system, then they can communicate over the network. (N.B. this is not a 
necessary requirement on the real Internet.) A user’s utility rises linearly 
with the proportion of computers that can be communicated with, up 
to a maximum increment of 2. Let x be the proportion of IT-users, then 
7r(W, x) = 1 + 2x and x{L, x) = 2 + 2(1 — a;). What are the ESSs in this 
population game? 

Exercise 8.3 

Consider a sex ratio game in which females can choose between two pure 
strategies: 

Si: produce n offspring in which the proportion of males is 0.8 

S2: produce n offspring in which the proportion of males is 0.2 

Consider a female using the mixed strategy cr = (p, 1 — p) in a popula- 
tion with a proportion of males = p. (a) Find the expected number of 
grandchildren for this female, (b) Hence show that, in a monomorphic 
population, the only possible evolutionarily stable sex ratio has M = 5- 
(c) Find the strategy which leads to p = ^ in a monomorphic population 
and show that it is evolutionarily stable. 



^>2 
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8.4 Pairwise Contest Games 

As an example of a pairwise contest, let’s look at one of the first evolutionary 
games that was invented to model conflict between animals. The two basic types 
are called “Hawks” and “Doves” . However, they are not intended to represent 
two different species of animal; instead they represent two types of behaviour 
(i.e., actions or pure strategies) that could be exhibited by animals of the 
same species. The terminology arises from human behaviour where those who 
advocate pre-emptive military solutions to international problems are called 
“Hawks” while those who would prefer a more diplomatic approach are called 
“Doves” . 

The biological significance of the Hawk-Dove game is that it provides an 
alternative to group-selectionist arguments for the persistence of species whose 
members have potentially lethal attributes (teeth, horns, etc.). The question to 
be answered is the following. Because it is obviously advantageous to fight for 
a resource (having it all is better than sharing) , why don’t animals always end 
up killing (or at least seriously maiming) each other? The group-selectionist 
answer is that any species following this strategy would die out pretty quickly, 
so animals hold back from all out contests “for the good of the species” . The 
“problem” with this is that it seems to require more than just individual-based 
Natural Selection to be driving Evolution. So, if group selection is the only 
possible answer, then that would be a very important result. However, the 
Hawk-Dove game shows that there is an alternative - one that is based fairly 
and squarely on the action of Natural Selection on individuals. So, applying 
Occam’s Razor, there is no need to invoke group selection.^ 

Example 8.11 (The Hawk-Dove Game) 

Individuals can use one of two possible pure strategies 

H : Be aggressive ( “be a Hawk” ) 

D : Be non-aggressive ( “be a Dove” ) . 

In general, an individual can use a randomised strategy which is to be aggressive 
with probability p, i.e., ct = (p, 1 — p). A population consists of animals that are 
aggressive with probability x, i.e., x = (x, 1 — x), which can arise because (i) in 
a monomorphic population, everyone uses the strategy ct = (x, 1 — a;), or (ii) in 

^ William of Occam (1285-1349) was a Franciscan friar. His logical principle, as ex- 
pressed in Summa Totius Logicae, states “frustra fit per plura quod potest fieri 
per pauciora” (it is pointless to do with more what can be done with less). This 
approach was echoed later by Isaac Newton (1642-1727) in the Philosophiae Nat- 
uralis Principia Mathematica: “We are to admit no more causes of natural things 
than such as are both true and sufficient to explain their appearances” . 
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a polymorphic population a fraction x of the population use ajf = (1,0) and a 
fraction 1 — x use an = (0,1). We will consider only monomorphic populations 
for the moment. 

At various times, individuals in this population may come into conflict over 
a resource of value v. This could be food, a breeding site, etc. The outcome 
of a conflict depends on the types of the two individuals that meet. If a Hawk 
and a Dove meet, then the Hawk gains the resource without a fight. If two 
Doves meet, then they “share” the resource. If two Hawks meet, then there is 
a fight and each individual has an equal chance of winning. The winner gets 
the resource and the loser pays a cost (e.g., injury) of c. The payoffs for a focal 
individual are then 



V — C V 

7r(cr,x) =px— hp(l - x)v+ (1 -p)(l - x)- . 



To make things interesting, we assume v < c (this is then “the Hawk-Dove 
game”). It is easy to see that there is no pure-strategy ESS. In a population of 
Doves, X = 0, and 

V 

7r(cr,X£,) = pv + {l-p)- 

= 



so the best response to this population is to play Hawk (i.e., individuals using 
the strategy an = (1, 0) will do best in this population). As a consequence, the 
proportion of more aggressive individuals will increase (i.e., x increases). In a 
population of Hawks, x = 1, and 



v-c 

n{a,XH) 

so the best response to this population is to play Dove (i.e., p = 0 ~ remember 
that we have assumed v < c). 

Is there a mixed-strategy ESS, a* = {p*,l — p*)l For a* to be an ESS, it 
must be a best response to the the population x* = {p* , 1— p*) that it generates. 
In the population x*, the payoff to an arbitrary strategy a = {p,l — p) is 



7r(cr, X*) 



pp*^:^-^ + p{l - p*)v + {1 - p){l 



/, pc 

“-'’b+T 




p") 



V 

2 



If p* < v/c then best response is p = 
response is p = 0 (i.e., p p*). If p* 
gives the same payoff (i.e., 7r(a*,x*) 
So we have 



: I (i.e., p yf p*). If p* > v/c, then the best 
= v/c, then any choice of p {including p*) 
= 7r((j, X*)) and is a best response to x*. 
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(i.e., be aggressive with probability vjc) as a candidate for an ESS. Recall that 
V < c, so this is a proper mixed strategy. 

To confirm that cr* is an ESS we must show that, for every a ^ a*, 

7r(CT*,Xe) > 7r(cr,Xe) 

where the post-entry population profile is 

Xe = ((l-£)p*-P£p, ((l-£)(l-p*)-Pe(l-p)) 

= {P* + £{P- P* + s{p* - P)) ■ 

Now 

7T(cr*,Xe) =p*(p* -P£(p-p*))^^y^ -P p*{l-p* +e{p* -p))v 

+ {l-p*)(l-p* + s{p*-p)f- 

and 

= p{p* + £{p - P*))^^^~Y~ + p{^ - P* + £{p* - p))v 

+ (1 -P)(l -P* + e(p* -P))^ ■ 

So, after a few lines of algebra (using the fact that p* = v/c), we find 
7t(ct*,X£) - 7r(cr,Xe) = ^{p* ~ pf 

> 0 Mp^ p* (i.e. Vct yf a*) 

which proves that cr* is an ESS. 



Exercise 8.4 

Consider a Hawk-Dove game with v > c. Show that playing H is an ESS. 

In order to provide a change of emphasis, we will now consider an economic 
model for the introduction of currency as a medium of exchange. Because we 
do not want to get mired in the economic details, the model will be rather 
schematic. 

Example 8.12 (The Evolution of Money) 

On a remote, tropical island the inhabitants realise that trade could be con- 
ducted more efficiently if they used something as a token for buying and selling, 
rather than exchanging goods directly. On the island there are two objects that 
could be used for this purpose: beads and shells. Each individual can choose 
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to use beads or to use shells, but he can only complete a transaction if the 
person he is attempting to trade with uses the same token. For simplicity, we 
will normalise the payoffs so that a trader gets a utility increment of 1 if the 
transaction succeeds and 0 if it fails. 

The general strategy available to an individual is to use beads with proba- 
bility p: i.e., cr = (p, 1— p). A general population profile is x = (x, 1 — a;): i.e., the 
proportion of individuals in the population who are using beads is x. Assuming 
that an individual attempts to trade with a randomly selected member of the 
population, his payoff is 



7t((t, x) = px + (1 — p){l — x) 
= (1 - a;) -l-p(2a: - 1) . 



From this we see that 

X > - p = 1 and p = 1 a: = 1 
2 

so ctJ = (1,0) is a potential ESS, with a corresponding population profile 
X = (1,0). The post-entry population is 



Xe = (1 -e)(l,0) -ke(p, 1 -p) 

= (1 -e(l -p),e(l -p)) 



In this population, the payoff for an arbitrary strategy is 
7r(cr,Xe) = e(l -p) +p{l - 2e(l -p)) 
and the payoff for the candidate ESS is 

7r(crJ,Xe) = 1 - e(l -p) . 

So 

7!‘(CTb,Xe) - 7r((J,Xe) > 0 

^ (1-p)(1-2£(1-p)) > 0. 

Now, Vp yf p* we have 1 — p > 0, so crj is an ESS if and only if e(l — p) < |. 
That is, the e mentioned in definition 8.9 of an ESS is equal to a half. 

The strategy ct* = (0, 1) is another ESS because in the relevant post-entry 
population, x^ = (ep, 1 — ep), the payoff for an arbitrary strategy is 

7r(cr,X£) = (1 -£p) -p(l - 2ep) 

and the payoff for the candidate ESS is 

7r(crJ,Xe) = 1 - £p . 
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So 



- 7r(cr,Xe) 

p(l - 2ep) 



> 0 
> 0 . 



Now, Vp ^ p* we have p > 0, so ctJ is an ESS if and only if ep < | - i.e., £ = 
The final candidate for an ESS is = (^, |), because 



^=2 



pe[0,l]^x€ [ 0 , 1 ] 



(including, of course, x = 

Xe = 



^). Consider the post-entry population 
(1 -£) 0 +£(P,1 -P) 



The payoff for an arbitrary strategy is 7r(cr, x^) = ^ + 1£(1 — 2p)^ and the payoff 
for the candidate ESS is 7r(cr^,Xe) = |. So 



7r(f7-m,Xe) - 7r(CT,Xe) > 0 

-^£(l-2p)2 > 0. 

Because £ > 0 and p ^ this condition cannot be satisfied; so is not an 
ESS. 

Putting the three results together, we can see that the population of is- 
landers will evolve to use either beads or shells as currency; the final outcome 
depends on the proportion of islanders that initially chooses beads. 



Exercise 8.5 

Consider a Prisoners’ Dilemma where the payoffs for an interaction be- 
tween two individuals are given by 



P2 





C 


D 


c 


3,3 


0,5 


D 


5,0 


1,1 



If a population of individuals play this pairwise contest, what is the ESS? 
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8.5 ESSs and Nash Equilibria 

In this section, we show that the ESSs in a pairwise contest population game 
correspond to a (possibly empty) subset of the set of Nash equilibria for an as- 
sociated two-player game. We restrict our attention to pairwise contest games, 
because the concept of a Nash equilibrium has no meaning for a game against 
the field. 

In a pairwise contest population game, the payoff to a focal individual using 
CT in a population with profile x is 

^ ^ p(s)x(s')7t(s,s') . (8.4) 

sGS s'GS 

This payoff is the same as would be achieved in a two-player game against 
an opponent using a strategy u' that assigns p'{s) = x(s)Vs € S, so we can 
always associate a two-player game with a population game involving pairwise 
contests. 

Definition 8.13 

If a pairwise contest population game has payoffs given by Equation 8.4, then 
the associated two-player game is the game with payoffs given by the numbers® 
7Tl(s, s') = 7t(s, s') = 7T2(s', s). 

In a monomorphic population, if a* is an ESS, then x* = <t*. So, if there 
is a Nash equilibrium in the associated game corresponding to the ESS in the 
population game, then it must be of the form (a*, a*). That is, a symmetric 
Nash equilibrium can be associated with an ESS but an asymmetric one cannot. 



Theorem 8.14 

Let a* be an ESS in a pairwise contest then, Vct yf a* either 

1. 7t((t*,ct*) > 7t(ct, CT*), or 

2. 7r(fT*,CT*) = 7t(ct, CT*) and 7r((j*,cr) > 7r(cr, ct) 

Conversely, if either (1) or (2) holds for each cr <t* in a two-player game, then 
CT* is an ESS in the corresponding population game. 

By convention, player 1 is taken to be the focal player in the population game. 



5 
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Proof 

If a* is an ESS, then, by definition (for e sufficiently small), 

7r(CT*,Xe) > 7r(cr,X£) 

where x^ = (l — e)a*+ea. For pairwise contests, this condition can be rewritten 
as 

(1 — e)Tr{a* ,a*) + STr{a* ,a) > (1 — e)Tr{a, a*) + STr{a, a) . (8-5) 

Converse. If condition I holds, then Equation 8.5 can be satisfied for e suf- 
ficiently small. If condition 2 holds, then Equation 8.5 is satisfied for all 
0 < e < 1. 

Direct. Suppose that Tr{a*,a*) < Tr{a, a*), then 3e sufficiently small that Equa- 
tion 8.5 is violated. So we have 

(8.5) 7r((T*, a*) > 7r(a, a*) . 

If 7 t(ct*, a*) = 7r((j, a*), then 

(8.5) 7r((T*, cr) > 7r(cr, a) . 



□ 



Remark 8.15 

The Nash equilibrium condition is 7r((j*,cr*) > 7r((j, ct*) Vct ^ a* so the condi- 
tion 7r(cr*,CT) > 7r(cr, cr) in (2) is a supplementary requirement that eliminates 
some Nash equilibria from consideration. In other words, there may be a Nash 
equilibrium in the two-player game but no corresponding ESS in the popula- 
tion game. The supplementary condition is particularly relevant in the case of 
mixed-strategy Nash equilibria. 

Theorem 8.14 gives us an alternative procedure for finding an ESS in a 
pairwise contest population game: 

1. write down the associated two-player game; 

2. find the symmetric Nash equilibria of this game; 

3. test the Nash equilibria using conditions (1) and (2) above. 

Any Nash equilibrium strategy a* that passes these tests is an ESS, leading to 
a population profile x* = ct*. 
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Example 8.16 

Consider The Hawk-Dove game again. The associated two-player game is 



Player 2 



Player 1 





H 


D 


H 


v—c v—c 
2 ’ 2 


V, 0 


D 


0, V 


V V 
2 ’ 2 



It is easy to see that (for v < c) there are no symmetric pure-strategy Nash 
equilibria. To find a mixed-strategy Nash equilibrium, we use the Equality of 
Payoffs theorem (Theorem 4.27) 

= TTi{D,a*) 

= 

V 

c 

By the symmetry of the problem, we can deduce immediately that player 1 also 
plays H with probability p* = ((. To show that cr* = (p*, 1 — p*) is an ESS, we 
must show that either condition (1) or condition (2) of Theorem 8.14 holds for 
every a ^ a* . Because a* is a mixed strategy, the Equality of Payoffs theorem 
also tells us that that 7r(cr*, cr*) = 7r(cr, cr*). So condition (1) does not hold, and 
the ESS condition becomes 



7Tl(i7,CT*) 



q*^- + {l-q*)v 



7r(cr*, ct) > 7t(ct, cr) . 

Now 

7r(cr*,CT) = p* p'^-^ + p* {I - p)v + (1 -p*)(l -p)^ 

and 

7r(cr,cr) +p{^-p)v + {l-pf^ . 

So, after a few lines of algebra, we find 

7r(cr*,cr) - 7r(CT,cr) = '^{p* ~ pf 

> 0 \/p ^ p* 

which proves that cr* is an ESS. 

Exercise 8.6 

Find the ESSs for the population games defined by the following two- 
player games. 
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(a) P 2 





R 


G 


B 


R 


1,1 


0,0 


0,0 


G 


0,0 


1,1 


0,0 


B 


0,0 


0,0 


1,1 



(b) P 2 





G 


H 


G 


3,3 


2,2 


H 


2,2 


1,1 



(c) P 2 





A 


B 


A 


4,4 


0,1 


B 


1,0 


2,2 



Exercise 8. 7 



(d) P 2 





H 


D 


H 


1 1 
2’ 2 


2,0 


D 


0,2 


1,1 



A population of birds is distributed so that in any given area there are 
only two females and two trees suitable for nesting (Ti and P 2 ). If the 
two females pick the same nesting site, then they each raise 2 offspring. 
If they choose different sites, then they are more vulnerable to predators 
and only raise 1 offspring each. This situation can be modelled as a 
pairwise contest game, (a) Construct the 2-player payoff table and find 
all the symmetric Nash equilibria of this game, (b) Determine which 
of the Nash equilibria correspond to ESSs in the associated population 
game. 



Example 8.17 

In Section 6.4, we saw that there is a symmetric, mixed-strategy Nash equi- 
librium ((T*,(T*) for the War of Attrition game. An individual following this 
strategy will persist for a time drawn at random from an exponential distribu- 
tion with mean v j c (or, equivalently, will accept a cost x = ct drawn at random 
from an exponential distribution with mean v). Is the strategy a* an ESS? 
Because the Nash equilibrium is mixed, we have 

7t(ct, CT*) = 7r(cr*, CT*) V(T 

(after all, that’s how the equilibrium was found). So we need to show that 

7t(ct*, (j) > 7r((j, ct) Vct 

where ct is any pure or mixed strategy. This seems like a tall order, but fortu- 
nately the task is easier than it appears. If we can show that 



7t(ct*, a;) — 7r(a;, x) > 0 Vx G [0, 00 ) 
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then it follows that 
7r(cr*,cr) 

Now 

7r(cr*,a;) = 



) =£,, [n{a* 


,x) - 


r 1 f 


y\ 


/ - exp 




Jx V ^ 


yy 



2v exp ^ ^ — V 



V(T . 



and 7r(a;,x) = —x. So 

Tr{a*,x) — tt{x,x) = 2v exp — v + x 

which has a minimum value of t;ln(2) at a; = uln(2). This proves that cr* is an 



ESS. 



8.6 Asymmetric Pairwise Contests 

There are many situations in which the players engaged in a contest can be 
distinguished. In economic contexts, they may be a buyer and a seller or they 
may be a firm holding a monopoly in a market and a firm seeking to enter that 
market. In biological problems, they may be male and female birds dividing up 
the care of their offspring or they may be larger and smaller stags competing for 
dominance over a harem of females. Such differences between individuals may 
lead to an asymmetric payoff table: players may have different actions available 
to them or the payoffs may differ according to whether the player is male or 
female, a buyer or a seller. But even if no such payoff asymmetries arise, the 
possibility that players can occupy different roles in a game presents us with 
a problem. For example, it may be reasonable for a male to do one thing and 
a female to do another. How do we allow for such behaviour given that our 
formulation of evolutionary stability requires a symmetric game? 

In a population, an individual may find themselves playing a particular role 
in one game and playing another role in a later encounter. Thus a general strat- 
egy must specify behaviour for all roles: use s in role r, use s' in role r' , and 
so on. By specifying role-conditioned strategies, we obtain a symmetric popu- 
lation game. (At first sight, it may seem strange to specify strategies like “care 
for offspring if male, leave if female” because any given individual is usually 
either male or female throughout its entire life. In genetic terms, however, the 
genes that are assumed to control behaviour will be passed down to offspring 
that may be male or female, whatever the sex of the parent.) Payoffs can be 
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calculated if we know how often an individual assumes a particular role and 
how often the individuals who meet are playing in any two specified roles. For 
simplicity, we will assume that there are only two roles of interest in any game 
and that a player in one role always meets a player in the other role. Such 
a game is said to possess “role asymmetry”. We will also assume, as is often 
the case, that each individual finds themselves playing each role with equal 
probability. For example, in a contest between males and females, a gene has 
a 50% chance of finding itself controlling the behaviour of a male body, if the 
sex ratio is 1:1. 



Example 8.18 

Consider a variation on the Hawk-Dove game in which two individuals are 
contesting ownership of a territory that one of them currently controls. We 
assume that the value of the territory and the costs of contest are the same for 
both players. The difference with the standard Hawk-Dove game is that players 
can now condition their behaviour on the role that they occupy - “owner” or 
“intruder” . Therefore, pure strategies are now of the form “play Hawk if owner, 
play Dove if intruder”, which we will represent by the pair of letters HD (a, 
strategy that is often called “Bourgeois”). The full set of pure strategies is HH, 
HD, DH, and DD.^ We assume that any contest involves one player in each 
role and that each player has an equal chance of being an owner or an intruder. 
(In genetic terms, the genes that are currently in an owner may find themselves 
passed on to an offspring that has yet to find a resource to control.) With these 
assumptions, we can derive the payoff table shown in Figure 8.1. For example, 
consider the expected payoff to players using HH against opponents who use 
HD. Half the time they will be the owner using H against an intruder who 
uses D, and half the time they will be an intruder using H against an owner 
who also uses H. The expected payoff is 

1 Iv — c 3v — c 

-V H = . 

2 2 2 4 

There are two symmetric pure-strategy Nash equilibria: [HD, HD] and 
[DH, DH] . Because (for v < c) 

V 3v — c V 2v — c 



the strategies HD and DH are both ESSs. There is no mixed strategy ESS 
(see Exercise 8.8). 



Surprisingly, the rather bizarre strategy “play Dove if owner, play Hawk if intruder” 
is found in Nature (though rarely). See Maynard Smith (1982). 
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Player 1 



Player 2 





HH 


HD 


DH 


DD 


HH 


v—c v—c 


Zv—C V—C 


3v—c v—c 


V, 0 


2 ’ 2 


4 ’ 4 


4 ’ 4 


HD 


v—c 3v—c 
4 ’ 4 


V V 
2 ’ 2 


2v—c 2v — c 
4 ’ 4 


3v V 
4 ’ 4 


DH 


v—c 3v—c 
4 ’ 4 


2v—c 2v — c 
4 ’ 4 


V V 
2 ’ 2 


3v V 
4 ’ 4 


DD 


0, V 


V 3v 
4 ’ 4 


V 

4 ’ 4 


V V 
2 ’ 2 



Figure 8.1 Payoff table for the asymmetric Hawk-Dove game of Exam- 
ple 8.18. 

Exercise 8.8 

Set V = 4 and c = 8 in the payoff table shown in Figure 8.1 and show 
that there is no mixed strategy ESS. 

The absence of mixed strategy ESSs is a general feature of games with role 
asymmetry, as was shown by Selten in 1980. The proof is easier if we consider 
behavioural rather than mixed strategies. We will consider only games with 
two roles and the same two actions in each role. In such a game, a general 
behavioural strategy can be phrased as “use A with probability pi in role 1, 
use A with probability p 2 in role 2” . If we denote a behavioural strategy by (3 
then we can write 

P = (/3i, /?2) 

where Pi = (pi,l — pp is the behaviour specified for role i. The payoff for P 
against P' is then 

T^{P,P') = ^ 7 '‘(/ 3 i ,/ 32 ) + ^ 7 t (/ 32 , / 30 - 



Theorem 8.19 

In a pairwise contest game that possesses role asymmetry, all evolutionarily 
stable strategies are pure. 

Proof 

Suppose that, contrary to the theorem, P* is a randomising behavioural strat- 
egy that is an ESS. Then, by the equality of payoffs theorem, there is an- 
other strategy /3 for which 7t(/3, /3*) = 7t(/3*,/3*). In fact, there will be many 
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such strategies. Let us pick a (3 that differs from (3* in, say, role 1 so that 
(/?i,/32) = (/3i,/ 32)- Then 

T^0,f3*) = ^’^01,^2) + ^TT 02, (3t) 

which, together with the condition 7r(/3,/3*) = tt{(3* , P*) implies 

7T0i,P*^)=7r{PlP*). 

Hence 

T^00) = ^T^01,P2) +^TT02,0) 

= ^7t(/3i,/ 3^) + i7r(/3J,/3i) 

= l<f^*l,f3*2) + l<P*2 0i) 

= ^7t(/3J,/32) + ^7t(/3J,/3i) 

= 7t(/3*,/3) 

which contradicts our initial assumption. □ 

Remark 8.20 

A more general version of this theorem ~ for games with more than two actions 
and more than two roles - was established by Selten in 1980. However, it is 
important to note that it only applies to pairwise contest games. It does not 
hold in general population games that may have a non-linear population-wide 
component. 

8.7 Existence of ESSs 

Unfortunately, it is not true that all games have an ESS, as is demonstrated 
by the following example. 

Example 8.21 

Consider the two-player (children’s) game “Rock-Scissors-Paper” . The children 
simultaneously make the shape of one of the items with their hand: Rock (R) 
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beats Scissors (S); Scissors beat Paper (P); Paper beats Rock. If both play- 
ers choose the same item, then then game is a draw. One payoff table that 
corresponds to this game is 





R 


s 


p 


R 


0,0 


1,-1 


-1,1 


S 


-1,1 


0,0 


1,-1 


P 


1,-1 


-1,1 


0,0 



This two-player game has a unique Nash equilibrium [a*, a*] with a* = 
(|, but this strategy is not an ESS in the corresponding population game, 
because (for example) 

n{a*,R) = 0 = tt{R,R) . 

However, one important class of games always has at least one ESS. 

Theorem 8.22 

All generic, two-action, symmetric pairwise contests have an ESS. 



Proof 



A symmetric two-player game has the following form. 



P2 





A 


B 


A 


a, a 


b, c 


B 


c, b 


d, d 



By applying affine transformations (see Definition 4.34), we can turn this into 
the equivalent game 



P2 





A 


B 


A 


1 

1 


0,0 


B 


0,0 


d — b,d — b 



It is easy to see that the ESS conditions given in Theorem 8.14 are unaffected 
by this transformation. 

Because we are considering generic games, we have a yf c and b ^ d. There 
are three possibilities to consider. 
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1. If a — c > 0, then tt{A,A) > t:{B,A) and hence a a = (IjO) is an ESS, by 
condition (1) in Theorem 8.14. 

2. If d — & > 0, then Tr{B, B) > Tr{A, B) and hence cjb = (0, 1) is an ESS, by 
condition (1) in Theorem 8.14. 

3. If o — c < 0 and d — b<0, then there is a symmetric mixed strategy^ Nash 
equilibrium [ct*, a*] with a* = {p* , 1 — p*) and 

d-h 

a — c + d — b 

At this equilibrium, 7 t(ct*, a*) = 7r(cr, a*) for any strategy cr, so we have to 
consider the inequality in condition (2) of Theorem 8.14. Now 

7t(ct*, a) = pp*{a — c) + (1 — p)(l — p*){d — b) 

and 

7r((j, cr) = p^(a - c) + (1 -pY{d - b) 
so 

7r(cr*,cr) - 7r(cr,CT) = p{p* - p){a - c) + (I - p){p - p*){d - b) 

= {p* - p) [p{a - c + d-b) - {d-b)] 

= -{a- c+ d-b){p* - pY 

> 0 

So cr* is an ESS. 

Hence, there is always an ESS in the pairwise contest population game that 
corresponds to this two-player game. □ 



Exercise 8.9 

Determine whether the population games defined by the following two- 
player games have an ESS. 
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Apart from the possibility that an ESS may not exist for a given game, it 
is also the case that the interesting Nash equilibrium strategies in some dy- 
namic games turn out not to be ESSs. For example, in the Iterated Prisoners’ 
Dilemma, the Nash equilibrium strategies that are introduced to ensure co- 
operative behaviour ~ “Tit-for-Tat” , “Grim” , and similar strategies - are not 
ESSs. For example, we have® 

7Ti(crc,crc) = 7Ti(a-G,0-c) and 7Ti((Jc,crG) = 7Ti(aG,o-G) 

which means the ESS conditions in Theorem 8.14 do not hold. This problem 
arises because the strategic form of the Iterated Prisoners’ Dilemma dynamic 
game is non-generic and many of the Nash equilibria, therefore, occur in con- 
tinuous sets that provide the same payoff for all points in the set. Because an 
ESS must have a greater payoff than any other strategy in all nearby popu- 
lations, none of these Nash equilibrium strategies can be an ESS. The failure 
of many games to have (interesting) ESSs has led to the search for alternative 
stability concepts that are weaker: these include Neutral Stability, Evolution- 
arily Stable Sets, and Limit ESSs. However, none of these concepts has gained 
universal popularity. So, instead, our focus will now shift from the strategies 
to the evolution of the population structure itself. 



See Section 7.2 for definitions of these strategies. 





9 

Replicator Dynamics 



9.1 Evolutionary Dynamics 

In the previous chapter, we investigated the concept of an evolutionarily stable 
strategy. Although this concept implicitly assumes the existence of some kind 
of evolutionary dynamics, it gives an incomplete description. First, an ESS 
may not exist - in which case the analysis tells us nothing about the evolution 
of the system described by the game. Second, the definition of an ESS deals 
only with monomorphic populations in which every individual uses the same 
strategy. But, if the ESS is a mixed strategy, then all strategies in the support 
of the ESS obtain the same payoff as the evolutionarily stable strategy itself. 
So it is pertinent to ask whether a polymorphic population with the same 
population profile as that generated by the ESS can also be stable. To address 
these questions, we will look at a specific type of evolutionary dynamics, called 
replicator dynamics. 

We consider a population in which individuals, called “replicators” , exist in 
several different types. Each type of individual uses a pre-programmed strategy 
(for the game being considered explicitly) and passes this behaviour to its 
descendants without modification. In the replicator dynamics, it is assumed 
that individuals are programmed to use only pure strategies from a finite set 
S = {si, S2, . . . , Sfc}. Let rii be the number of individuals using Si, then the 
total population size is 

k 

N = 

2=1 
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and the proportion of individuals using Si is 




The population state can then be described by a vector x = (xi,X 2 , ■ ■ • ,Xk) 
(together with the overall size of the population N, which will not interest us). 
Let j3 and 5 be the background per capita birth and death rates in the popu- 
lation. That is, f3 and 5 represent the contributions to the rates of appearance 
and disappearance of individuals in the population which are independent of 
the game in question. The background per capita rate of change of numbers, 
/3 — i5, is modified by the payoff for using strategy Si in the population game 
under consideration. The rate of change of the number of individuals using Si 
is^ 

fii = {(3 - 5 + 7r(si,x))nj 

and the rate of change of the total population size is given by 

k 

IV = 

k k 

= if3 - S)'^ni + '^Tr{si,x)ni 

i=l 

k 

= {j3 - S)N + XiTT{sj,x.) 

i=l 

= {(3 — S + 7t(x))A^ . 

where we have defined the average payoff in the population by 

k 

^(x) = ^Xi7r(5i,x) . 
i^l 

Thus the population grows or declines exponentially. This may not be very 
realistic, but we can improve the description by letting f3 and 6 depend on N. 
So long as the fitness increments 7r(si,x) depend only on the proportions Xi 
and not on the actual numbers rii, the game dynamics will be unchanged. 

From a game-theoretic point of view, we are more interested in how the 
proportions of each type change over time. Now 

hi = Nxi + XiN 

so 

Nxi = hi — XiN 

= {P — S + Tr{si,x.))xiN — Xi{P — 6 + tt{x.))N. 

^ We use a dot to denote a time derivative so that, for example, x = dx/dt. 
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Cancelling and dividing by N, we have 

Xi = - Tt{x))xi. ( 9 . 1 ) 

In other words, the proportion of individuals using strategy Si increases (de- 
creases) if its payoff is bigger (smaller) than the average payoff in the popula- 
tion. 

Exercise 9.1 

Clearly, at any time we should have X)?=i = 1 - Show that if this 

condition is satisfied at time t = 0, then it is satisfied for all t > 0. 

Exercise 9.2 

Show that the evolutionary dynamics is unchanged under an affine trans- 
formation of the payoffs, provided the time parameter is scaled appro- 
priately. (An affine transformation changes the payoffs by tt — >■ Att -|- /x 
where /x is a real number and A is a positive real number.) 



Definition 9.1 

A fixed point of the replicator dynamics is a population that satisfies Xi = 0 
Vx. Fixed points describe populations that are no longer evolving. 

Example 9.2 

Consider a pairwise contest population game with action set A = {E, F} and 
payoffs 

7t(F,F) = 1 7t(F,F) = 1 tt{F,E) = 2 7t(F, F) = 0 . 

So 7t(F,x) = Xi + X2 and 7t(F, x) = 2x\, which gives 

7 t ( x ) = x\{xi + X 2 ) + X2{2xx) 

= x\+ 3 xiX2 . 

The replicator dynamics for this game is 

xi = xi(xi -I- X2 — — 3xia;2) 

X2 = X2{‘2xi — Xi — 3X1X2) . 

So the fixed points are {x\ = 0,X2 = 1 ), (xi = 1,X2 = 1 ) and (xi = ^,X2 = 5). 

Exercise 9.3 

Consider the pairwise contest with payoffs given in the table below 
(where a < b). 
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A 


B 


A 


a — b,a — b 


2a,0 


B 


0,2a 


a, a 



Derive the replicator dynamics equations for this game and find all the 
fixed points. 



9.2 Two-strategy Pairwise Contests 

Dealing with general games requires some mathematical techniques that not 
everyone will be familiar with. So, we will temporarily make a further simplifi- 
cation and consider pairwise contest games that only have two pure strategies. 
Suppose S = {si, S 2 } and let x = x\. Then X 2 = 1 — x and ±2 = —xi- So we 
only need to consider a single differential equation 

X = ( 7 t ( si , x ) — 7t(x))x. 

We can simplify this further by substituting 

7t(x) = X7t(si, x) -I- (1 — x)7t(s2, x) 

which gives 

X = x{l — x)(7t(si,x) — 7t(s2,x)) . 



Example 9.3 

Consider a pairwise contest Prisoners’ Dilemma. The pure strategies are {C, D} 
and the payoffs to the focal individual in the corresponding 2-player game are 
7 t(C', C) = 3, tt{C,D) = 0, tt{D,C) = 5, and tt{D,D) = 1. Let x be the 
proportion of individuals using C, then 

7t(C', x) = 3x -I- 0(1 — x) = 3x 



and 

tt{D, x) = 5x -I- 1(1 — x) = 1 -I- 4x . 

The rate of change of the proportion of individuals using C is 

X = x(l — x)(7t(C', x) — 7r(D, x)) 

= x(l — x)(3x — (1 -I- 4x)) 

= — x(l — x)(l -I- x) . 
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The fixed points for this dynamical system are cc* = 0 and x* = \. We know that 
the unique Nash equilibrium for the Prisoners’ Dilemma game is for everyone 
to defect (play D). This means that x* = 0 corresponds to a Nash equilibrium 
but X* = 1 does not. We also see that i < 0 for x G (0, 1). This means that 
any population that is not at a fixed point of the dynamics will evolve towards 
the fixed point that corresponds to the Nash equilibrium and away from the 
other one. 

Exercise 9.4 

Derive the replicator dynamics for the Hawk-Dove game and show that 
any population that is not at a fixed point will evolve towards the point 
that corresponds to the unique symmetric Nash equilibrium. 

It seems that every Nash equilibrium corresponds to a fixed point in the 
replicator dynamics but not every fixed point corresponds to a Nash equilib- 
rium. The following theorem proves this conjecture for pairwise contest games 
with two pure strategies. 

Theorem 9.4 

Let S = {si,S 2 } and let a* = {p*,l — P*) be the strategy that uses si with 
probability p* . If {a* , cr*) is a symmetric Nash equilibrium, then the population 
X* = (x*,l — X*) with X* = p* is a fixed point of the replicator dynamics 
X = x(l — x)(7t(si, x) — 7t(s2, x)). 

Proof 

If a* is a pure strategy, then x* = 0 or x* = I. In either case, we have x = 0. 
If a* is a mixed strategy, then Theorem 4.27 says that 7 t(si,(t*) = 7r(s2,cr*). 
Now, for a pairwise contest, 

7r(Si,CT*) = p*7r(si,Si) -I- (I -P*)7 t(Sj,S2) 

= 7r(si,x*) . 

So we have 7 t(si,x*) = 7t(s2 7 X*) and consequently x = 0. □ 

We have shown that Nash equilibria in two-player games and fixed points 
in the replicator dynamics are related. Is there a consistent relation between 
the ESSs in a population game and the fixed points? 
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Example 9.5 

Consider a pairwise contest with actions A and B and the following payoffs 
in the associated two-player game: tt{A,A) = 3, n{B,B) = 1 and n{A,B) = 
n{B, A) = 0. The ESSs are for everyone to play A or for everyone to play 
B. The mixed strategy cr = (j, |) is not an ESS. Let x be the proportion of 
individuals using A, then the rate of change of the proportion of individuals 
using A is 



X = a;(l — x)(7r(7l, x) — 7r(i3, x)) 

= a:(l — x)(3a; — (1 — x)) 

= a;(l — x){Ax — 1). 

The fixed points for this dynamical system are x* = 0, x* = 1 and x* = ^. 
However, we can see that i > 0 if x > | and i < 0 if x < so only the pure- 
strategy behaviours are evolutionary end points. If the population starts in a 
state where more than 25% of individuals use strategy A, then the population 
evolves until everyone uses A. On the other hand, If the population starts in a 
state where fewer than 25% of individuals use strategy A, then the population 
evolves until everyone uses B. This means that only the evolutionary end points 
correspond to an ESS. 

In the Hawk-Dove game, the correspondence between the evolutionary 
end-point of the replicator dynamics and the ESS is a bit less direct (see 
Exercise 9.4). The ESS is for each individual to play Hawk with probabil- 
ity v/c. However, in the replicator dynamics, individuals cannot use mixed 
strategies: an individual must either be a pre-programmed Hawk-user or a pre- 
programmed Dove-user. Nevertheless, the population evolves towards a state in 
which the proportion of Hawk-users is v/c, which is the polymorphic equivalent 
of the monomorphic ESS. 

Exercise 9.5 

A population of birds is distributed so that in any given area there are 
only two females and two trees suitable for nesting (Ti and T 2 ). If the two 
females pick the same nesting site, then they each raise 2 offspring. If they 
choose different sites, then they are more vulnerable to predators and 
only raise 1 offspring each. This situation can be modelled as a pairwise 
contest game. Derive the replicator dynamics equation and show that 
only the fixed points that correspond to an ESS are evolutionary end 
points. 
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9.3 Linearisation and Asymptotic Stability 

In the examples considered in the previous section, an ESS always corresponds 
to an evolutionary end point in the replicator dynamics. Do all ESSs have a cor- 
responding end point and do all evolutionary end points have a corresponding 
ESS? In this section, we continue to consider the special case of two-strategy 
pairwise contest games. Later we will consider general n-strategy games and 
reach a similar conclusion. Because the definition of an ESS considers small 
deviations from a specified population, it makes sense to do the same in the 
replicator dynamics. 

Definition 9.6 

A fixed point of the replicator dynamics (or any dynamical system) is said to 
be asymptotically stable if any small deviations from that state are eliminated 
by the dynamics as t 1 oo. 

Example 9.7 

Consider a pairwise contest with pure strategies A and B and the following 
payoffs in the associated two-player game 

7t(A,A)=3 tt{B,B) = 1 tt{A,B) = tt{B,A) = 0. 

We know that the ESSs for this game are for everyone to play A or for everyone 
to play B. The mixed strategy ct = (|, |) is a Nash equilibrium but it is not 
an ESS. Let x be the proportion of individuals using A, then the replicator 
dynamics is 

X = —x(l — x)(l — 4x) 

with fixed points at x* = 0, a;* = 1 and x* = ^. 

First, consider a population near to x* = 0. Let x = x* -I- e = e where we 
must have £ > 0 to ensure x > 0. Then x = e because x* is a constant. Thus 
we have 

£ = — £(1 — £)(1 — 4e) . 

Because it is assumed that £ <C 1, we can ignore terms proportional to £" where 
n > 1. This procedure is called linearisation. Thus 

£ « — £ 



which has the solution 



e{t) = £oC ‘. 





172 



9. Replicator Dynamics 



This tells us that the dynamics reduces small deviations from the population 
state X = (0, 1) (i.e., £ — l 0 as t l oo). In other words, the fixed point x* = 0 
is asymptotically stable. 

Now consider a population near to cc* = 1. Let x = x* — e = 1 — e with 
e > 0 (to ensure a; < 1). Following the linearisation procedure we find that 

e « —3e 

which has the solution 

e{t) = £oe“^‘. 

i.e., X* = 1 is asymptotically stable. 

Finally, consider a population near to x* = \. Let x = x* + e = j+£ (with 
no sign restriction on £). Then we have 



1 




with solution 

e{t) = £oe*/^® 

So xj = I is not asymptotically stable. (In fact, it is unstable.) 

So in this case we find that a strategy is an ESS if and only if the corre- 
sponding fixed point in the replicator dynamics is asymptotically stable. 



Theorem 9.8 

For any two-strategy pairwise contest, a strategy is an ESS if and only if the 
corresponding fixed point in the replicator dynamics is asymptotically stable. 



Proof 

Consider a pairwise contest with strategies A and B. Let x be the proportion 
of individuals using A, then the replicator dynamics is given by 

X = x(l — x)[7r(7l,x) — 7r(i?,x)]. 

There are three possible cases to consider: a single pure-strategy ESS or sta- 
ble monomorphic population; two pure-strategy ESSs or stable monomorphic 
populations; and one mixed strategy ESS or polymorphic population. 

1. Let cr* = (1,0). Then (for cr = (y, 1 — y) with y ^ 1) a* is an ESS if and 
only if 

7r(£l,Xe) - 7r(CT,Xe) > 0 

tt{A, Xe) - yTr{A, x^) - (1 - y)7r(S, x^) > 0 

(1 - y)[7r(£l,Xe) - 7r(S,Xe)] > 0 
7r(Tl,Xe) - 7r(S,Xe) > 0 
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Let X = 1 — £ with e > 0. Then 

e = -£[Tr{A, Xe) - tt{B, x^)]. 

So (T* = (1,0) is an ESS if and only if the corresponding population x* = 1 
is asymptotically stable. 

2. Let a* = (0, 1). Then, using a similar argument to the previous case, a* is 
an ESS if and only if 



- 7r(S,Xe) < 0. 

Let X = £ with £ > 0. Then 

£ = £[TT{A,Xe) - 7r(S,Xe)]. 

So a* = (0, 1) is an ESS if and only if the corresponding population x* = 0 
is asymptotically stable. 

3. Let a* = {p*, 1 — p*) with 0 < p* < 1. Then cr* is an ESS if and only 
if 7r(cr*,cr) > 7r(cr, (j). Taking a = A and a = B in turn, this condition 
becomes the two conditions 

tt{A, A) < tt{B, A) and tt{B, B) < t:{A, B) . 

Let X = X* + £. Then, for a pairwise contest, the replicator dynamics 
equation 

X = x(l- x)[7r(£l,Xe) - 7r(S,Xe)] 

becomes 

£ = x*(l — x*)£ {[t^{A, A) — tt{B, yl)] + [7r(i?, B) — tt{B, A)]) 

using the assumption that x* is a fixed point. So x* is asymptotically stable 
if and only if a* is an ESS. 



□ 

Let F be the set of fixed points and let A be the set of asymptotically 
stable fixed points in the replicator dynamics. Let N be the set of symmetric 
Nash equilibrium strategies and let E be the set of ESSs in the symmetric 
game corresponding to the replicator dynamics. Then we have shown that, for 
any two-strategy pairwise- contest game, the following relationships hold for a 
strategy a* and the corresponding population state x*: 

1. CT* G E X* G A; 
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2. X* G A ^ CT* G N;2 

3. CT* G N X* G F. 

Allowing our now customary abuse of notation that identifies a strategy with 
its corresponding population state, we can write these relations more concisely 
as 

E = A C N C F. 

As we shall see, for pairwise-contest games with more than two strategies these 
relations become 

E C A C N C F. 

Exercise 9.6 

Consider the pairwise contest with payoffs given in the table below 
(where a yf 0). 





A 


B 


A 


a, a 


0,0 


B 


0,0 


a, a 



Find all the ESSs of this game for the cases a > 0 and a < 0. Derive 
the replicator dynamics equation for the proportion of A-players x. Find 
all the fixed points of the replicator dynamics equation (for a > 0 and 
a < 0). Show that only the fixed points that correspond to an ESS are 
asymptotically stable. 



9.4 Games with More Than Two Strategies 

If we increase the number of pure strategies to n, then we have n equations to 
deal with. 

i* = /i(x) i = 

Using the constraint Ym=i introduce a reduced state vector 

(xi,X 2 , ■ . ■ , Xn-i) and reduce the number of equations to n — 1. 

Xi = fi{xi,X2, . ■ .,Xn-i) f = 1, . . . ,n - 1 . 

^ This follows from the first equivalence because a* G E a* G N. 
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We can write this dynamical system more compactly in vector format as^ 



i = f(x) . 



There is no confusion introduced by referring to both types of state with the 
same symbol, x. It will be clear from the context which type of state is being 
referred to. 



Example 9.9 



Consider the following pairwise contest game, which will be used as the basis 
of all the examples in this section. The game has the payoff table 





A 


B 


c 


A 


0,0 


3,3 


1,1 


B 


3,3 


0,0 


1,1 


C 


1,1 


1,1 


1,1 



The replicator dynamics for this game is 

Xi = Xi{3x 2 + Xs — Tt{x.)) 

X 2 = X2{3xi + Xs — Tt{x)) 

X 3 = X3(1-7t(x)) 

with 7t(x) = 6 X 1 X 2 +X 1 X 3 +X 2 X 3 +X 3 . Writing x\ = x, X 2 = y and X 3 = 1—x—y, 
this system can be reduced to the two-variable dynamical system 

X = x{l — x + 2y — Tr(x,y)) 
y = 2/(1 + 2a; - j/ - 7r(x,y)) 

with 7r(a;, y) = 1 + 4xy — x^ — y^. 

Exercise 9. 7 

Find all the Nash equilibria and ESSs for the game in Example 9.9. 
Show that the set of fixed points for the replicator dynamics is the same 
whether we consider the full or the reduced system. 

® Readers who are not familiar with basic dynamical systems theory may find it 
useful to read Appendix B. 
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Definition 9.10 

The replicator dynamics is defined on the simplex 

Z\ = a;2, ■ • • , a:„ I 0 < Xj < 1 Vi & 

An invariant manifold is a connected subset M C 
then x(t) G M for all t > 0. 

It follows immediately from the definition that the fixed points of a dy- 
namical system are invariant manifolds. Boundaries of the simplex A (subsets 
where one or more population types are absent) are also invariant because 

Xi = 0 Xi = 0 . 

Example 9.11 

For the dynamical system 

X = x{l — X + 2y — Tt{x,y)) 
y = 2/(1 + 2a; - y - 7r(x,y)) 

the obvious invariant manifolds are the fixed points (see previous exercise) and 
the boundary lines a; = 0 and y = Q. The boundary line l = 0is 

invariant because (on that line) 

d , . . . 

— (x-Ry) = x + y 

= {x + y-l){l-Tt{x,y)) 

= 0 

The line x = y is also invariant because x = y on that line. 

To obtain a qualitative picture of the solutions of the dynamical system, we 
consider the behaviour of the solutions on (or close to) the invariant manifolds. 
First, let us consider a fixed point x*. By making a Taylor expansion of the 
dynamical system about this fixed point, we obtain a linear approximation to 
the dynamical system (remember that f(x*) = 0 so the constant term vanishes): 

i=i ^ 



i=i ) 

A such that if x(0) G M, 
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Defining = xt — x*, we have 



j=i 1 



which is a linear system ^ with a fixed point at the origin. The matrix L 
has constant components 

and its eigenvalues determine the behaviour of the linearised system at the 
fixed point. Provided the fixed point is hyperbolic (i.e., all eigenvalues have 
non-zero real part), the behaviour of the full, non-linear system is the same.^ 
Combining this information with the behaviour of solutions on the other invari- 
ant manifolds is usually sufficient to determine a complete qualitative picture 
of the solutions to the dynamical system. 



Example 9.12 



Returning to our example, let us consider the fixed point {x*,y*) = (5,5). 
Close to this point we have the linear approximation 




The eigenvalues are found from the characteristic equation det(L — XI) = 0 , 
which yields Ai = — | and A2 = — Because the real parts of both eigenvalues 
are negative, the fixed point is a stable node. Solving the eigenvector equation 




gives the eigenvectors corresponding to each eigenvalue. In this case, we find 
the eigenvector corresponding to A = — | is ^ = —rj, which lies along the 
boundary line x + y = 1 . The eigenvector corresponding to A = — | is ^ = ?7, 
which lies along the line x = y. This eigenvector also passes through the fixed 
point (x*,y*) = (0, 0), which is a good indication that the line x = y might be 
invariant for this dynamical system - as, indeed, we have already shown that 
it is. 

The fixed points (x*, y*) = ( 1 , 0 ) and (x*, y*) = ( 0 , 1 ) both have eigenvalues 
Ai = 3 and A2 = 1 , so both points are unstable nodes. Close to the point 
(x*,j/*) = (0,0) the linear approximation is 



77 



0 

0 



0 

0 




This is the Hartman-Grobman theorem. See Appendix B. 
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(0,1) 




Figure 9.1 The behaviour of the replicator dynamics system from Exam- 
ple 9.9 can be constructed by linking the fixed points by smooth trajectories 
that are consistent with behaviour of the system on or near to the invariant 
manifolds (fixed points and invariant lines). 

which is not hyperbolic (Ai = A 2 = 0). So the linearisation tells us nothing 
about the stability properties of this fixed point. 

Let us now look at the behaviour of the system on the invariant lines. On 
the line y = 0 we have x = x^(x — 1), so x < 0 for 0 < x < 1. Similarly, 
on the line a; = 0 we have y < Q for 0 < y < 1. On the line x = y we have 
X = x^{l — 2x), so X and y are both increasing for 0 < x,?/ < 1. On the line 
x-kj/— l = 0we have 



X = x{3 — 3x — tt{x,1 — x)) 

= x{3 — 9x + 6x^). 

Hence x is increasing {y is decreasing) for 0 < a; < | and x is decreasing {y is 
increasing) for 1 < a; < 1.^ 

Combining all this information we can produce the qualitative picture of 
the dynamics shown in Figure 9.1. 



Exercise 9.8 

Draw a qualitative picture of the replicator dynamics for the pairwise 
contest game with payoff table shown below. 

® Because the point x* = (|, |) is a stable node, any other behaviour would indicate 
that a mistake had been made somewhere. 
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A 


B 


C 


A 


3,3 


0,0 


1,1 


B 


0,0 


3,3 


1,1 


C 


1,1 


1,1 


1,1 



9.5 Equilibria and Stability 

Let F be the set of fixed points and let A be the set of asymptotically sta- 
ble fixed points in the replicator dynamics. Let N be the set of (symmetric) 
Nash equilibrium strategies and let E be the set of ESSs in the symmetric 
game corresponding to the replicator dynamics. We will show that, for any 
pairwise-contest game, the following relationships hold for a strategy cr* and 
the corresponding population state x*: 

1. CT* G E X* G A; 

2. X* G A a* G N; 

3. CT* G N X* G F. 

Allowing our customary abuse of notation that identifies a strategy with its 
corresponding population state, we can write these relations more concisely as 

E C A C N C F. 

First we consider the inclusion N C F. 



Theorem 9.13 

If (cr*, cr*) is a symmetric Nash equilibrium, then the population state x* = a* 
is a fixed point of the replicator dynamics. 



Proof 

Suppose the Nash equilibrium strategy a* is pure, so that every player in the 
population uses some strategy Sj. Then Xi = 0 for i ^ j and 7t(x*) = 7r(sj, x*). 
Hence Xi = 0 VL 

Suppose the Nash equilibrium strategy a* is mixed and let S* be the support 
of cr* (i.e., S* contains only those pure strategies that are played with non-zero 
probability under a*). The equality of payoffs theorem (Theorem 4.27) gives 

7t(s, cr ) = 7r(cr , cr ) Vs G S 
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This implies that, in a polymorphic population with x* = a* , we must have 
for all Si € S* 



7r(si,x*) 



k 

'^'K{Si,Sj)Xj 

i=i 

k 

'^Tl{Si,Sj)pj 

i=i 

7r(Si,CT*) 

constant 



For strategies Sj ^ S*, the condition x* = cr* gives us ccj = 0 and hence Xi = 0. 
For strategies Sj £ S* we have 



Xi = Xi 



^ k \ 

V i=i / 



7t(Sj,X*) -7r(Sj,X*)^Xj 

i=i 



= 0 . 



□ 



Remark 9.14 

Theorem 9.13 shows that an evolutionary process can produce apparently ra- 
tional (Nash equilibrium) behaviour in a population composed of individuals 
who are not required to make consciously rational decisions. In populations 
where the agents are assumed to have some critical faculties - such as human 
populations - the requirements of rationality are much less stringent than they 
are in classical game theory. Individuals are no longer required to be able to 
work through the (possibly infinite) sequence of reaction and counter-reaction 
to changes in behaviour. They merely have to be able to evaluate the conse- 
quences of their actions, compare them to the results obtained by others who 
behaved differently and swap to a better (not necessarily the best) strategy for 
the current situation. The population is stable when, given what everyone else 
is doing, no individual would get a better result by adopting a different strat- 
egy. This population view of a Nash equilibrium was first advanced by Nash 
himself, who called it the “mass action” interpretation. 



Next we consider the inclusion A C N. 
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Theorem 9.15 

If X* is an asymptotically stable fixed point of the replicator dynamics, then 
the symmetric strategy pair [a*, a*] with a* = x* is a Nash equilibrium. 



Proof 

First, we observe that if x* is a fixed point with > 0 Vt (i.e., all pure strategy 
types are present in the population), then all pure strategies must earn the same 
payoff in that population. It follows from the correspondence of a* and x* that 
7Ti(s, a*) = 7t(s, X*) is also constant for all pure strategies s. Therefore, [a* , a*] 
is a Nash equilibrium. 

It remains for us to consider stationary populations where one or more pure 
strategy types are absent. Denote the set of pure strategies that are present by 
S'* C S' (i.e., S* is the support of the fixed point x* and the postulated Nash 
equilibrium strategy a*). Because x* is a fixed point, we must have 7 t(s,x*) = 
7t(x*) Vs G S* and 7 Ti(s,ct*) = 7ri(cr*,cr*) Vs G S*. Now suppose that [a*, a*] 
is not a Nash equilibrium. Then there must be some strategy s' ^ S* for which 
7Ti(s',CT*) > 7ri(cr*,CT*) and consequently for which 7 t(s',x*) > 7r(x*). Consider 
a population x^ that is close to the state x* but has a small proportion £ of s' 
players. Then 

£ = £(7t(s',X£) - 7r(Xe)) 

= £ (7t(s',X*) - 7f(x*)) + 0(£^). 

So the proportion of s'-players increases, contradicting the assumption that x* 
is asymptotically stable. □ 

Finally we consider the inclusion E C A. 



Definition 9.16 

Let X = f(x) be a dynamical system with a fixed point at x*. Then a scalar 
function V (x), defined for allowable states of the system close to x*, such that 

1. E(x*) = 0 

2. E(x) > 0 for X yf X* 

3. < 0 for X yf X* 

is called a (strict) Lyapounov function. If such a function exists, then the fixed 
point X* is asymptotically stable. 





182 



9. Replicator Dynamics 



Theorem 9.17 

Every ESS corresponds to an asymptotically stable fixed point in the replica- 
tor dynamics. That is, if a* is an ESS, then the population with x* = a* is 
asymptotically stable. 



Proof 



If a* is an ESS then, by definition, there exists an e such that for all e < e 
7r((j*,cre) > 7r((T, CTe) 'ia ^ a* 

where erg = (1 — e)a* -I- ea' . In particular, this holds for u = so 7r(cr*, > 

7r((Te,(T£). This implies that in the replicator dynamics we have, for x* = a*, 
X = {1 — e)x* + ex' and all £ < £ 

7t(ct*, a;) > 7f(a;). 



Now consider the relative entropy function 

Clearly E(x*) = 0 and (using Jensen’s inequality E/(a:) > /(Ea;) for any 
convex function, such as a logarithm) 



E(x) = 



> 




-ln(l) 



0 . 



The time derivative of V (x) along solution trajectories of the replicator dy- 
namics is 



dt 



E(x) 



^dx, * 






-E 
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= ~'^—Xi(7r(si,x) - 7r(x)) 

i=i 

= — [7t((T*, x) — #(x)] . 

If (J* is an ESS, then we established above that there is a region near to x* 
where x) — tt{x)] > 0 for x yf x*. Hence 



for population states sufficiently close to the fixed point. E(x) is therefore a 
strict Lyapounov function in this region, and the fixed point x* is asymptoti- 
cally stable. □ 

The three preceding theorems establish the advertised relationship between 
the sets of ESSs (E), symmetric Nash equilibria (N), fixed points (F), and 
asymptotically stable fixed points (A): 

E C A C N C F. 

In general, there may be asymptotically stable fixed points in the replicator 
dynamics which do not correspond to an ESS as is shown in the next exercise. 

Exercise 9.9 

Consider the pairwise contest game with the payoff table below. Show 
that the polymorphic population x* = (|, |, |) is asymptotically stable 
in the replicator dynamics, but that the strategy a* = (|, |, |) is not 
an ESS. [Hint: consider the strategy a = (0, 5, |)]. 





A 


B 


C 


A 


0,0 


1,-2 


1,1 


B 


-2,1 


0,0 


3,1 


C 


1,1 


1,3 


0,0 



If the derivative of the relative entropy function for a fixed point (taken 
along solution trajectories) is positive, then the fixed point is unstable. If the 
derivative is zero, then the fixed point is neither asymptotically stable nor 
unstable: the evolution of the population is periodic around the fixed point. 



Example 9.18 

Consider the Rock-Scissors-Paper game. Let xi be the proportion of i?-players, 
X2 be the proportion of ^-players, and X3 be the proportion of P-players. Then 
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Figure 9.2 A qualitative picture of the replicator dynamics for the Rock- 
Scissors-Paper game analysed in Example 9.18. We have used the reduced state 
vector description with 1 — Xi — X 2 - 

the replicator dynamics system is 

Xi = Xi{x2-X3) 

±2 = X2{x3~Xi) 

is = X3(xi - X 2 ) 

with fixed points (1,0,0), (0,1,0), (0,0,1), and (5,5,5). It is easy to see by 
considering the boundaries that the first three points are not stable. For ex- 
ample, consider the invariant line xi = 0 where, for 0 < X 2 ,X 3 < 1, we have 
X 2 > 0 and ±3 < 0. The results from the three invariant lines together imply 
that there is some kind of oscillatory behaviour about the polymorphic fixed 
point (5, 5, 5): if the fixed point is asymptotically stable, then trajectories will 
spiral into toward it; if it is unstable, then trajectories will spiral out from it. 
The third possibility is that solution trajectories form closed loops around the 
fixed point. That this is, in fact, the case can be confirmed by observing that 
the time derivative of the relative entropy along solution trajectories of the 
replicator dynamics is 

^P(x) = -i(a :2 - 3^3) - ^(a ;3 - a;i) - i(a;i - X2) 

= 0 . 
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A qualitative picture of the replicator dynamics for the Rock-Scissors-Paper 
game is shown in Figure 9.2. 

Exercise 9.10 

Consider a variation of the Rock-Scissors-Paper game in which there is 
a cost to both players (payoff = — c) only if the result is a draw. Show 
that X* = (5,5,5) is asymptotically stable in the replicator dynamics. 





Part IV 



Appendixes 




A 

Constrained Optimisation 



Suppose we want to maximise a function of two variables, f{x,y), subject to 
a constraint g{x,y) = 0 that expresses an implicit relation between x and y. 
In some cases, this implicit relationship may be easily turned into an explicit 
one of the form y = h{x), and the maximum of the function can be found by 
differentiating f{x, h{x)) with respect to x. 



Example A.l 

To maximise the function f{x, y) = x — y^ subject to the constraint x — y = 0, 
rewrite the constraint a,sy = x and differentiate f{x, x) = x—x^. The maximum 
is then at (x*,y*) = (5,5)- 

There is an alternative approach to constrained optimisation: the method 
of Lagrange multipliers. (As we will show later, this method has the advantage 
that it can also be applied in cases where the constraint is in the form g(x, y) < 0 
and direct substitution is impossible.) First we combine the function to be 
maximised, f{x,y), and the function defining the constraint, g{x,y), into a 
single function called the Lagrangian^ 

L{x,y) = f{x,y) - Xg{x,y). 



^ As this function is named after the French mathematician Joseph Louis Lagrange 
(1736-1813), its name is sometimes written as “Lagrangean” . 
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(The, as yet unknown, constant A is called the Lagrange multiplier and will be 
determined once we have found the maximum we require.) Then we perform an 
unconstrained maximisation of the Lagrangian. One way of looking at this pro- 
cedure is to view it as an unconstrained maximisation of the original function 
f{x, y) with an additional penalty for violating the constraint g{x, y) = 0. The 
following theorem shows that following this procedure does, indeed, produce a 
maximum of the original function of interest subject to the imposed constraint. 

Theorem A. 2 

If L(x*,y*) is an unconstrained maximum of the Lagrangian, then the maxi- 
mum of the function f{x, y) subject to the constraint g{x, y) = 0 occurs at the 
point {x*,y*). 

Proof 

The maximum of the Lagrangian is found by solving simultaneously the three 
equations obtained by differentiating with respect to x, y, and A. 

^-A^ = 0 
dx dx 

^-A^ = 0 
dy dy 

g{x,y) = 0 

Note that differentiating by A will always lead to the constraint being satisfied 
at the maximum, if one can be found: i.e., g{x* ,y*) = 0. Then 

f{x*,y*) = L{x*,y*) + Xg{x*,y*) 

= L{x*,y*) 

> L{x,y) Vx,y 

= Vx,y such that g{x,y) = 0 

□ 

Example A. 3 

To maximise the function f{x, y) = x — y^ subject to the constraint x — y = 0, 
introduce the Lagrangian 

L{x, y) = X - y"^ - \{x - y). 
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Differentiating with respect to x, y, and A gives the three equations 

1 - A = 0 
-2y* + A = 0 
X* -y* = 0 

with the solution x* = y* = ^. We also know that A = 1 but that is not relevant 
to the solution of our original problem. 

We will now show how this method can be extended to situations in which 
the constraint is a weak inequality rather than an equality: that is, g{x, y) < 0. 
It is, in fact, quite simple: we construct the same Lagrangian L(a;, y) = f{x, y) — 
Xg{x,y), but now we also require A > 0. 



Theorem A. 4 

If L(x*,y*) is an unconstrained maximum of the Lagrangian and A > 0, then 
the maximum of the function f{x,y) subject to the constraint g{x,y) < 0 
occurs at the point (x*,y*). 



Proof 



Suppose that we have found an unconstrained maximum of the Lagrangian 
with A > 0. Then, as before, we have g{x*,y*) = 0. So 



> 



> 



L{x\y*) 

L{x,y) \/x,y 
f{x,y) - \g{x,y) 

f{x,y) Vx,y such that g{x,y) < 0 



□ 



The following example should help to clarify the use of Theorem A. 4. 

Example A. 5 

Suppose we wish to maximise the function f{x,y) = + y^ subject to the 

constraint x'^+y’^ < 1. First we put the constraint in the required form: x'^+y'^ — 
1 < 0. Then we construct the Lagrangian, L(x, y) = 2x“^ + — A(a:^ + — !)• 

The three equations that must be satisfied simultaneously are 



x{2 - A) = 0 
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2/(1 -A) = 0 

+ y^ = 1 . 

The solutions of these are (i) A = 1, a; = 0 and y = ±1 and (ii) A = 2, x = ±1 
and y = 0. In each case A is positive, as required. But the points (x, y) = (0, ±1) 
and (x,y) = (±1,0) are only extrema of the Lagrangian and not necessarily 
maxima. Because /(O, ±1) = 1 but /(±1, 0) = 2 only the points (x, y) = (±1, 0) 
are maxima of /(x,y). 





B 

Dynamical Systems 



Although it is easy to analyse the behaviour of a one-dimensional dynami- 
cal system, such as the replicator dynamics for two-strategy pairwise contest 
games, it is much more difficult to understand the behaviour of a system in 
two dimensions or more. In Section 9.3, we introduced a linearisation proce- 
dure to provide a connection between the stability of replicator dynamics fixed 
points and ESSs. Because we can easily understand the behaviour of the full, 
non-linear system, it is apparent that the picture of the behaviour near to 
the fixed points is the same whether we consider the full system or its lin- 
earised approximation. If this relationship holds true for systems with two or 
more dimensions^, then we have some hope of understanding the full system 
by considering its linearised approximation. With this in mind, we begin by 
considering linear dynamical systems. 



Linear Dynamical Systems 

For simplicity, let us consider a two-dimensional linear dynamical system. This 
can be written as 



ii = axi + bx2 

X2 = CXi + dX2 



^ It will turn out that this often, but not always, is the case. 
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where a, b, c, and d are constants. (We have assumed that the unique fixed point 
of the system occurs at the origin. If it doesn’t, we can just shift the coordinate 
system so that it does.) We can also write this in matrix form as 



\X2 J \C d ) \X2 ) 

or even more compactly as 

X = Lx 



where 




If the system happens to have the special form 



2:2 





then the solution is straightforward: 



z(t) = 






V22(0)e^^V ■ 



(B.l) 



(B.2) 



We solve the more general case by finding a transformation of variables x — 1 z 
such that Equation (B.l) can be written as 



z = ylz 



(B.3) 



with 

Suppose that such a transformation can be achieved through multiplication by 
a matrix T: z = Tx. Then the solution x(t) of Equation (B.l) can be found by 
applying the inverse transformation to Equation (B.2): 

x(t) = r~^z(t) . 

So we need to find out two things: what are the elements of the matrix T~^ 
and what are the constants A^? Applying T to Equation (B.l) gives 

Tx = TLx 

= TLT"^Tx . 

Comparing this with Equation (B.3) shows that we must have 




TLT~^ = A 





B. Dynamical Systems 



195 



or 

LT-i = T~^A . 



From this we see that the columns of the matrix T~^ must be the eigenvectors 
of the matrix L and the constants Xi are the associated eigenvalues.^ 

Now we can find the solution of the general linear dynamical system we 
began with. Let us write 

T~i ^ f 

\Vi V2 J ' 



Then the solution is^ 



x(t) = T ^z{t) 

_ f Ui W2 \ / Zi{t) \ 

\vi V2 ) \Z2it) ) 

_ f UiZi{t) + U2Z2{t)\ 

V ViZi{t) + V2Z2{t) ) 

= Vi0i(t) + V2 22 (f) 

= vi2i(0)e^^* + V2Z2(0)e'^^* 

where we have introduced 




which are the eigenvectors of L. This solution specifies an evolution of the 
system x(t) = {x{t),y{t)) for each initial state x(0) = (a:(0), i/(0)). The initial 
state determines values of the constants 2 ^( 0 ). 



Remark B.l 

Nothing in the previous discussion actually depends on the number of equations 
being 2. So we can immediately find the solution of an u-dimensional linear 
dynamical system x = Lx (where L is an n x n matrix) . It is 

n 

x{t) = ^ 

i=l 

where the are the eigenvectors and the Xi are the eigenvalues of the matrix 
L and the Ci are constants whose values depend on the initial state. 

^ If v is the eigenvector of a matrix L with associated eigenvalue A, then Lv = Av. 
® For simplicity, we are ignoring “degenerate cases” in which there is either a single 
repeated eigenvalue or one or more of the eigenvalues is zero. 
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Example B.2 

Consider the system of equations 

X = X — y 
y = y-x 



which can be written in matrix form with 




Solving the characteristic equation 



0 



det(L — A/) 

1-A -1 

-1 1-A 

(l-A)^-l 



det 



gives the two eigenvalues as Ai = 2 and A 2 = 0. The corresponding eigenvectors 
are found from the equation Lx = Ax. For Ai = 2 we have 




which gives x = y. Thus, the vector 




(or any scalar multiple of it) is an eigenvector corresponding to the eigenvalue 
Ai = 2. Similarly, an eigenvector for the eigenvalue A 2 = 0 is 




Therefore, the solution of the dynamical system is 



x{t) = zi( 0 )e^‘ + 02(0) 

y{t) = zi(O)e^‘-02(O) 

where the constants Zi(0) are determined by the initial state of the system, 
x(0) and y(0). 



Remark B.3 

In general, both the eigenvalues and the eigenvectors of the matrix L may be 
complex. However, the constants Zi(0) will also be complex in such a way that 
the final expressions for x(t) are always real. 
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We will not be concerned with the exact solution of a linear dynamical 
system because it will, in general, only be an approximation to the full non- 
linear system that we are really interested in. What will be of interest is the 
qualitative behaviour of the system near the fixed point. The solution is the 
sum of terms involving e'^'* where each complex eigenvalue can be written in 
the form \ = a + ioj. Because 

^{a+iuj)t _ e“‘(coswt -I- isinwt) 

we can make the following observations: 

1. If a < 0 for all eigenvalues, then x(t) approaches the fixed point at t — >■ oo. 
That is, the fixed point is asymptotically stable. 

2. If a > 0 for one or more eigenvalues, then x(t) diverges from the fixed 
point along the directions of the corresponding eigenvectors. That is, the 
fixed point is unstable. 

3. For a; yf 0, there is some sort of cyclic behaviour. If the fixed point is stable, 
then x(t) spirals in towards the fixed point as t ^ oo. If the fixed point is 
unstable, then x(t) spirals out from the fixed point. If all eigenvalues are 
imaginary (i.e., have a = 0) then trajectories form ellipses around the fixed 
point. 

Table B.l shows the classification of the fixed points of a linear dynamical 
system x = Lx according to the nature of the eigenvalues of the matrix L. 



Table B.l Classification of the fixed points of the linear dynamical system 
X = Lx according to the real and imaginary parts of the eigenvalues of L. In 
the case “All a yf 0” , some are positive and some negative. 



All a < 0 


w = 0 


Stable node 


All a > 0 


a; = 0 


Unstable node 


All a y^ 0 


w = 0 


Saddle point 


All a < 0 


a; yf 0 


Stable spiral 


All a > 0 


w yf 0 


Unstable spiral 


All a = 0 


a; yf 0 


Centre 
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Non-linear Dynamical Systems 

To obtain a qualitative understanding of a non-linear dynamical system, we 
follow a 4-step procedure: 

1. Find the fixed points of the non-linear system. 

2. Derive a linear approximation of the system close to each fixed point. 

3. Determine the properties of the fixed point in the linearised system (see 
Table B.l). 

4. Combine this information to produce a sketch of the full, non-linear system. 

Example B.4 

Consider the two-dimensional dynamical system 

X = x(l — x)(l — 2y) 
y = j/(i-y)(i-2a:) 

defined on the unit square (i.e., 0 < x,?/ < 1). The fixed points (x*,y*) of this 
system are the set of points {(0, 0)(0, 1)(1, 0)(1, l)^, 5 )}. Now we linearise the 
system close to each of the fixed points in turn. 

(a;*,y*) = (0,0) 

Write ^ = X — X* = X and rj = y — y* = y then 

e = C(l-0(l-2r?) 
i = »7(1 -2 ^ . 

Ignoring non-linear terms (i.e., terms of the form with n -I- m > 1), we 

have the linear approximation 

e = ? 

f] = ri . 

Clearly this fixed point is an unstable node (Ai = A 2 = 1). 

(a:*,j/*) = (0,l) 

Write ^ = X — X* = X and r] = y — y* = y— 1, then 



? = e(l-6(l-2(l-r?)) 

V = -(1 + ??)? 7 ( 1 - 2 ^) . 
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Ignoring non-linear terms, we have the linear approximation 

C = 

i = -V ■ 



Clearly this fixed point is a stable node (Ai = A 2 = —1). 

Writing ^ = 1 — x and rj = y — 0, we find that this fixed point is an unstable 
node. 



Writing ^ = 1 — a; and ij = 1 — y, we find that this fixed point is a stable 
node. 

(x*,y*) = (i, i) 

Write ^ = X — ^ and t] = y — ^ , then 

e = 

V = {\-vn-20- 

Ignoring non-linear terms, we have the linear approximation 



V 



1 




Because the matrix 




has eigenvalues Ai = ^ and A 2 = — | with corresponding eigenvectors 



ei 




and 62 




this fixed point is a saddle point with stable direction x = y and unstable 
direction x = —y. 



The behaviour of the (linearised) system near each of the fixed points is 
shown in Figure B.l. Based on this, it seems reasonable that the behaviour 
of solutions of the full system should (qualitatively) look like that shown in 
Figure B.2. 
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( 0 , 1 ) ( 1 , 1 ) 




Figure B.l The behaviour of the linearised system from Example B.4 near 
each of the fixed points. 

( 0 , 1 ) ( 1 , 1 ) 




( 0 , 0 ) ( 1 , 0 ) 

Figure B.2 Solutions of the full system from Example B.4 can be constructed 
by linking the fixed points in a way indicated by their properties in the linearised 
system, as shown in Figure B.l. 

In the previous example, we went from the picture of a system that was 
based on a linear approximation near the fixed points to a more complete 
(although qualitative) picture of the full system. We did this by joining up 
the fixed points in a way indicated by the properties of the fixed points in the 
linearised system. This procedure relies on two assumptions. First, we have 
assumed that the properties of fixed points in the full system are similar to 
the properties of those fixed points in the linearised approximation. Second, we 
have assumed that the solution that passes through any given point that is not 
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a fixed point is unique and that the solution changes continuously as the given 
point is varied. The validity of these assumptions is confirmed by the following 
two theorems, which we state without proof. 

Definition B.5 

A fixed point of a dynamical system is called hyperbolic if the linearisation of 
the system near the fixed points has no eigenvalues with a zero real part. 

Theorem B.6 

The Hartman- Gohman theorem. If a fixed point is hyperbolic, then the topology 
of the fixed point in the full, non-linear system is the same as the topology of 
the fixed point in the linearised system. 



Remark B.7 

The Hartman-Grobman theorem justifies the use of the linearisation approach 
to discovering the properties of fixed points in a dynamical system in most 
cases. The exceptions are situations where one or more eigenvalues are purely 
imaginary. In such cases, the stability (or otherwise) of the fixed point must be 
determined by other means. 



Theorem B.8 

Consider a dynamical system x = f (x) . If the vector field f has continuous first 
derivatives at a point x(0), then (i) there is a unique solution x(t) that passes 
through x(0) and (ii) the solution x(t) changes smoothly as the point x(0) is 
varied. 



Remark B.9 

In the replicator dynamics for pairwise contest games, the functions f are poly- 
nomials and are, therefore, always suitably well-behaved. In particular, this 
theorem means that trajectories in the replicator dynamics cannot cross. 

As we have already remarked, the stability (or otherwise) of a non-hyperbolic 
fixed point cannot be determined by linearisation. The linearised system has 
eigenvalues with real parts that are zero, and the stability of the system is deter- 
mined by higher order terms in expansion about the fixed point. In such cases, 
we can make use of the direct Lyapounov method (named after the Russian 
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mathematician who devised it). 

Theorem B.IO 

Let X = f(x) be a dynamical system with a fixed point at x*. If we can find 
a scalar function F(x), defined for allowable states of the system close to x*, 
such that^ 

1. F(x*) = 0 

2. I/(x) > 0 for X yf X* 

3. < 0 for X yf X* 

then the fixed point x* is asymptotically stable. 

Proof 

The third condition implies that V is strictly decreasing along solution tra- 
jectories of the dynamical system. Because V (x) > 0 with equality only for 
X = X*, this implies that limt_>oo V = Q and hence limt_>oo x(t) = x*. □ 

Remark B.ll 

The drawback of the direct Lyapounov method is that there is no general 
procedure (apart from trial and error) for constructing a Lyapounov function. 
For the replicator dynamics, however, a class of functions known as relative 
entropy functions seems to work well in many cases. 

Example B.12 

Consider the dynamical system 

X = -y 
y = X — x^y. 

In the linearised system 

i = -V 

?) = T 

A function with these properties is called a strict Lyapounov function. 
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the fixed point at the origin is a centre (both eigenvalues are purely imaginary). 
Because 

^ if + f) = 2^^ + 27777 
= 0 

trajectories (in the linear system) form concentric circles about the origin: 
^(t)^ + 77(t)^ = constant. This tells us nothing about the full non-linear system, 
but it does give the hint that we should try V = -I- 77^ as a Lyapounov 

function. Clearly this function satisfies the first two conditions, and because 

dy r. ■ r. ■ 

— = 2xx + 27777 

at 

= 2a;(-77) -f 2y{x - x^y) 

= -2x^y^ 

< 0 for X yf X* 

it satisfies the third as well. The origin is, therefore, an asymptotically stable 
fixed point. 





Solutions 



Chapter 1 

1.1 Value of F{n) for various n is 



n 


0 


1 


2 


3 


4 


5-f 


F{n) 


1 


9/2 


6 


11/2 


3 


< 0 



So n* = 2 and F{n*) = 6. 

1.2 Because g{0) = 0 and g{b/2) < 0 for ab < 4c 

a;* = I I if > 4c 
f 0 otherwise 



1.3 (a) X* =3 (/'(3) = 0 & /"(3) < 0). 

(b) X* = 2 lf'{x) > 0 for ® e [1, 2]). 

(c) = 3 (/"(®) > 0). 

1.4 The payoff function is the return on the investment, because the two accounts 
are otherwise identical. Let us first assume that the initial capital is included. 
ThenTr(ai) = £1000x 1.06 = £1060 and7r(a2) = £1000 x (1.03)^ = £1060.90. 
So the investor should choose the second account, which pays 3% at six-month 
intervals. The result will be the same if the initial sum is not included (it is an 
affine transformation). 

1.5 (a) Income = qP{q), which is maximised at q = ^qo- 
(b) Profit = q{P{q) — c), which is maximised at 
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1.6 Profit = (p — ci)T(p) — Co, which is maximised at 

* Po + ci 



The maximum profit is only positive if T{p* — ci)(p*) — co > 0. This condition 
is satisfied if 

Tofl-—) >4—. 

V PoJ Po 

If this condition is not satisfied, then the factory should not be built. 

1.7 (a) 7 r(ai) = 7 r(a 2 ) = 1.5 and 7 r(a 3 ) = 1. So o* = fli or 02 but not 03 . 

(b) 7 r(ai) = 0.75, 7 t(o 2 ) = 2.25 and ^{as) = 1. So a* = 02 . 

1.8 The expected profit is 

ru poo 

7t(u) = / {px — cu) f (x) dx / {px — cu — k{x — u))f{x) dx 

Jo Ju 

poo poo 

— / {px — cu) f (x) dx — k / {x — u)f{x)dx 

Jo Ju 

poo poo 

= pd — cu — k xf{x) dx + ku / f{x) dx . 

J u J u 



poo poo 

/ xf(x)dx = (u + d}e~'^^'^ and / f(x)dx = e~'^^’^ 

J u J u 

we obtain 

7 t ( m ) = pd — cu — . 

Differentiating with respect to u and setting the result equal to zero, we find 



u -dlnl- . 



1.9 Because w ~ N(p, a^), the expected utility is 



E(u(w)) = 1 — 



J-c 






By completing the square, we find 



2a2 2 



E(u(t/;)) = 1 — exp \ — Ikp — 



argmaxE(w(i/^)) = argmax ( kp — 



= argmax 1 “ 2 
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1.10 The expected utility is 

7T = E(w) — fcE(w^) 

= E(w) — k (E(w))^ — fcVar(w) . 

From example (1.24) we have 

E(w) = r + a(/i — r) and Var(w) = 
so 

■Tr(a) = r — kr^ + a(/i — r) — 2akr{ji — r) — kc^{n — r)^ — ka^ . 
This payoff has a maximum at 



So 



. (/i — r)(l — 2kr) 

2k{pL — r)2 + 2fccr2 

to if a < 0 
a* — < a ifO<a<l 

I 1 a > 1 



1.11 



1.12 



1.13 



Expected number of surviving offspring is nH (n) . 



n 


0 


1 


2 


3 


4+ 


nH (n) 


0 


0.9 


1.2 


0.3 


< 0 



So nH(n) has a maximnm at n* = 2 



Maximising the expected number of offspring is eqnivalent to maximising the 
adnlt’s probability of survival. The probability of survival, if the bird “chooses” 
sit© % is 

S{i) = (1 - AO {PiMn + (1 - Pi)Mi) . 



So 



S(l) = 0.656 S(2) = 0.666 S(3) = 0.627 

which gives the optimal patch choice as i* — 2. 

The payoff for a general behaviour f3 is 

^p(a) 7 r(a) = p{a) P{X = x)Tr{a\x) 

a^A a^A 



x^X a^A 

= ^ P{X = x)n{[3\x) . 

xGX 



1.14 Because 7 r(ai) = 7 r(a 3 ) = 5 and 7 r(a 2 ) = 3|, optimal randomising behaviours 
have support A* = { 01 , 02 } with p(oi) = p and p{as) = 1 — p (0 < p < 1). 
Using either oi or 03 with probability 1 is also an optimal behaviour. 





208 



Solutions 



Chapter 2 

2.1 (a) The decision tree is 




The pure-strategy set is 

S = {NNN, NND, NDN, NDD, DNN, DND, DDN, DDD} 

and the optimal strategy is NND, which gives a payoff of 20 cents. 

(b) If play has reached the last choice, then it is optimal to choose the dime 
rather than the nickel. At any other point, it is optimal to choose the nickel 
because, for example, choosing the nickel now and the dime next time gives 
a total future payoff of 15 cents compared to the 10 cents gained by choosing 
the dime now. 

(c) At any decision point, the probability that the game will continue for 

another n choices is so the expected future number of decisions is 






fc=i 



p 

i-p 



In this sense, all decision points are the same, so if it is optimal to choose an 
action now, it will be optimal to choose the same action in the future (i.e., the 
optimal strategy is stationary). The expected future payoff for choosing the 
nickel is 



whereas the payoff for choosing the dime is just 10 (because the game then 
stops) . So it is optimal to choose the nickel every time if p > | . 



2.2 The payoff for the behavioural strategy is 



n{f3) = 1 X 15 + i X 10 = 12.5 



The payoff for the mixed strategies is 



= 5x 15+(E®) xl0 + a;xl0 = 12.5 
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2.3 (a) Because all decision points are reached with positive probability by playing 
the mixed strategy, the behavioural equivalent j3 = , (1, 0), (0, 1)) is 

unique. 

(b) Because decision point 3 is not reached by playing the mixed strategy, 
any choice at that point gives a behavioural equivalent. So the equivalent 
behavioural strategies are all of the form 

(^( 1 - 0 ), ,(*, 1 - 2 ;)^ 



with X G [0, 1]. 

2.4 (a) Using the notation C for care, D for desert, and H, M, and L for the choice 
of the high-quality, medium-quality or low-quality male, the decision tree is 

1 




(b) The female should always care and should choose her mate according to 
the rule: 



Choose 



H if \vh > vm 
M if \vh < Vm ■ 
L never 



Chapter 3 

3.1 (a) The payoff is 



7t(co, Cl, C2) = In(co) -b In(ci) -b ln(c 2 ) 
and the constraint is 

Cl C2 ^ 

Co -b — -b ^ Xq 
r 

which gives the solution 



Ct+i = rct with ^ 

(b) By backward recursion the solution is 




* 

Co = 



Xo 

y 



C 2 = X2 
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Substituting in the state equation, we find 

* 1 , ^ 1 2 
Cl = 2 ^{Xo - Co) = 2^3 ®o ^ ^^0 

C2 = r(xi - Cl) = r^xi = rci 
3.2 The correspondence is given in the following table. 



General description 


Example 3.4 


rrixT) 


0 


rt{xt,at) 


In(ct) 


p(a:'|a;, a) 


II 

H 


a{xt,t) 


Ct(xt) 


s 


(co(xo),ci(xi)) 


X 


[0,oo] 


A{x, t) 





3.3 Beginning in state x, the process proceeds as follows 

t=0 t=l t=2 t=3 

P = 1 P=i 

(x, a) ► {y, b) ► {y, b) ► y 




and the expected payoff is 2 + 10 + 1 10 = 17. Beginning in state y, the process 
proceeds as follows 

t=0 t=l t=2 t=3 

P=i P=i 

(y, b) (y, b) «- (y, b) >- y 




and the expected payoff is 10 + |l0 + jlO = 17.5. 

3.4 In state z, there is no choice to be made and ( 2 ) = 0, Vt. The absence of a 
terminal reward vt also gives us tt^{x) = ’n’tijj) = 0. 

At time t = 2 in state x, we have 

7 T 2 (a:|a) = 1 + 713(2/) = ! 

7T2(a;|6) = 2 + 7T3(a:) = 2 
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so a*(x, 2) = b and n 2 {x) = 2. In state y, we have 
7T2(j/|a) = 3 + 7T3(a:) = 3 

7T2(t/|6) = 5 + + ^7r5(2) = 5 

so a*{y,2) — b and 7r|(y) = 5. 

At time t = 1 in state x, we have 

■Ki{x\a) = l + 7r5(y) = 6 
ni{x\b) = 2 + 7r5(a:)=4 

so a*{x, 1) = a and tti(x) = 6. In state y, we have 

ni(yja) = 3 + tt 2 (x) = 5 

7ri(y\b) = b+^-Kl{y)+‘^Til{z) = Q^ 

so a*{y,2) = b and Trt{y) = 6\. 

At time t = 0 in state x, we have 

Tvo{x\a) = l + 7Ti(y) = 7| 

7To(a:|fe) = 2 + 7r*(a:) = 8 
so a* (a;, 0) = 6 and ttq{x) = 8. In state y, we have 

7To(i/|a) = 3 + 7Ti(a;) = 9 

7To(y|6) = 5 + + 1^1(2) = ^ = 6.5625 

so a*{y,2) = a and TVoiv) = 9. 

The solution of the problem is, therefore, the optimal strategy 

t = 0 t = 1 t = 2 
* _ X f b a ^ \ 

^ ~ y \ a b b ) 

where we have ignored state 2 because it is irrelevant whether we say that the 
action in state 2 is a or b. The payoff is 8 if the process starts in state a: or 9 
if the process starts in state y. 

3.5 The payoffs are 

00 .. 

t=o 

^ 9 

^(s2)=^25* = — 

t=0 

so S 2 is better than s\ as we thought it should be. 
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3.6 Following the strategy s — {a(a;) = a,a{y) = a} gives 

7r(a:|s) = 2+^7r(t/|s) 

7r(y|s) = ^7 t(2|s) 

7r(2|s) = 6+^7r(a;|s). 

These can be solved (find 7 t(2|s) first) to give 7r(®|s) = 7 t(2/|s) = 4 and 7 t(2|s) = 
8. Swapping to action b in state x yields a payoff of 1 + |7r(a;|s) = 3. Swapping 
to action b in state y yields a payoff of 1 + |7r(a:|s) = 3. From this we can 
conclude that the strategy we guessed is optimal. 

3.7 The optimal strategy is the same as the one found in Example 3.19. 



Chapter 4 

4.1 One possible payoff table is 



Scarpia 





Fake 

execution 


Real 

execution 


Kill 


7tt(K, F),tvs{K, F) 


7tt(K, R),ns{K, R) 


Sleep 


tvt(S, F),tvs{S, F) 


7vt{S, R),ns{S, R) 



Because Tosca (presumably) prefers to keep her honour in any case, we have 
ttt{K,F) > ttt{S,F) and ttt{K,R) > tvt{S,R). Because Scarpia (presum- 
ably) prefers to do his duty in any case, we have ns{K,R) > tts{K,F) and 
tts{S,R) > tts{S,F). These conditions mean the game inevitably has the 
stated outcome. 

4.2 (a) Eliminate D; then eliminate L, leaving {U, R). (b) Eliminate D (it is dom- 
inated by the mixed strategy of playing U and C each with probability |); 
then eliminate R; then eliminate U, leaving (C, L). Alternatively, eliminate R, 
then eliminate U and D, leaving (C,L). 

4.3 The pair {D, L) is a Nash equilibrium because 

7Ti(cri,L) = lOp -I- 10(1 — p) = 10 = -ki{D,L) 

and 

7T2(-D, (72) = p - (1 - p - q) < p < 7T2(-D, L) . 

Similarly, the pair {U, M) is a Nash equilibrium because 

7Tl (cti , M) = 5 = 7Tl (17, M) 

and 

■X 2 {U, 02 ) = q - - p - q) < q < ^2(11, M) . 
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4.4 (a) Let ai — (p, 1 — p) and a 2 = (q, 1 — q), then 

7Ti(cri, a2) = 1 + q +p(l + q) . 

So the best response for player 1 is di = (1,0) (i.e., use U) whatever player 2 
does. Similarly 

7T2(o-i, (T 2 ) = 1 + q + p . 

So player 2’s best response is always <T 2 = (1,0) (i.e., use L). The unique Nash 
equilibrium is, therefore, {U,L). 

(b) Let (Ti = (p, 1 — p) and ct 2 = {q,l — q), then 7ti(cti, ( 72 ) = g + p(2 — 3g) so 
the best responses for Pi are 

r (0, 1) if 1 > I 

O'! = S (1,0) if? < I 

[ (a;, 1 — x) with a; G [0, 1] if g = |. 

Similarly, 7T2 ((ti, 02 ) = p + q(2 — 3p) so the best responses for P 2 are 

( (0, 1) if P > I 

d-2 = < (1,0) if < I 

I (j/, 1 - 2 /) with 1 / e [0, 1] ifp=|. 

So the complete set of Nash equilibria is (M,R), (P, IT) and (o-i,<t|) with 

at = (72 = (|, I). 



4.5 We underline the best responses for each player. 



P 2 





L 


M 


R 


u 


4,3 


2,7 


0,4 


D 


5,5 


5,-1 


-4,-2 



So the unique pure-strategy Nash equilibrium is {D, L). 

4.6 Let K be the set of integers from 0 to 1000, i.e., K = {0, 1, 2, ..., 999, 1000}. 
The best responses for Si G K are 

Si = 1000 — S2 
S2 = 1000 — Si . 

So any pair of sums of money s* G K and S 2 G K with s} + S 2 = 1000 is a 
Nash equilibrium. (There is also an infinite number of Nash equilibria where 
both sons choose an amount of money greater than £1000!) 





R 


S 


p 


R 


0,0 


1,-1 


-1,1 


S 


-1, 1 


0,0 


1,-1 


P 


1,-1 


-1,1 


0,0 



4.7 



(a) The payoff table is 
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(b) There are clearly no pure-strategy Nash equilibria. Find a mixed-strategy 
equilibrium using the Equality of Payoffs theorem. Let ct 2 = (r, s, 1 — r — s), 
then 

CT2) =^1(5, 0-2) = 7 ri(P, (72) 

2s-|-r — 1 =1 — s — 2r = r — s . 

which can be solved to give r = s = | . By following the analogous procedure for 
player 2, we find the unique Nash equilibrium is (cr*, a*) with a* — |). 

4.8 (i) If o > c, then {A, A) is a symmetric Nash equilibrium, (ii) If d > b, then 
{B, B) is a symmetric Nash equilibrium, (iii) If a < c and d < h, then there is 
no symmetric pure strategy Nash equilibrium, so we look for a mixed strategy 
Nash equilibrium using the Equality of Payoffs theorem. Let al = {p* , 1 — p*) 
and ctJ = (g*, 1 — q*). Then 



and 



TVl{A,a2) 
aq* + 6(1 — q*) 



■ki{B, 0 - 2 ) 
cq* -f- d(l - g*) 
jb-d) 

{c — a) + {b — d) 



We have 0 < 
equilibrium. 



- n : 2 {( yi , A ) = 772(0-1,5) 

ap* -\- b{l — p* ) = cp* -I- d(l — p*) 



p = 



jb-d) 



{c — a) + {b — d) 

= g* < 1 as required for a symmetric mixed strategy Nash 



4.9 The Nash equilibria for both games are (U,L), (D,R) and ((|, |) , (|, |)). 

4.10 (a) Let o-i = (p, 1 — p) and 0-2 = (g, 1 — g). Then 7 ti(o-i, 0 - 2 ) = 6g-|- 5p(l — g), so 



Now 772 ( 0 - 1 , 0 - 2 ) 





Vg < 1 


(*, 1 — *) with X G [0, 1] 


for g = 1 


= 3p + g(l - 4p), so 




f(i,o) 


Vp < 1 


02 = (0, 1) 


Vp > 1 


[ (p, 1 - p) with p G [0, 1] 


for p = i 



Therefore, the Nash equilibria are 



J {{x, 1 — x), C) with X e [0, j] 

\(B,5) 



(b) Let 0-1 = (p, 1— p) and 02 ~ (g, r, 1—q—r). Then 771 ( 0 - 1 , 0 - 2 ) = 2-|-3g—4r -fpr, 
so 



0-1 = 



(1, 0) if r > 0 and Vg G [0, 1) 

(x, 1 — x) with a: G [0, 1] if r = 0 and Vg G [0, 1] . 
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Now 7T2 (o-i, (T 2 ) = 3(1 — p) + rp, so 

.^f(0,l,0) ifp>0 

^ \ (y, z,l - y - z) miii y,z,y + z G [0,1] if p = 0 . 

Therefore, the Nash equilibria are 

f* I (J’’ (y> 0, 1 - y)) with y e [0, 1] 

The best responses for the first game are shown in the figure below. The best 
responses for player 1 are shown by a solid line and those for player 2 by a 
dotted line. Where they meet are the Nash equilibria (indicated by the circle 
and the thick line) 

1 1 : 1 



q 



0 J 



r 

0 



p 



-o 

"1 



4.11 Let p — P(player 1 plays A) and q = P(player 2 plays C), then 

7ri(cri,(72) = (2 - q) + p{Xq - 1) 

7T2(o-i,cr2) = (2-p) + q{Xp-l) ■ 



For A < 1, the best responses are p — q = 0 so (D,R) is the unique Nash 
equilibrium. For A > 1, the best responses are 



ii.(i 

[ any x £ [0, 1] 



and 



q = 



ifq< i 
ifq> i 
if q = i 

1 if P < i 

0 if P > i 

any y G [0,1] if P = y 



The Nash equilibria are {U,L), (D,R), and (ctijctJ) with 



Ui — (T2 



1 A-1 
A’ A 



As A— >■ 1, (crCjCrJ) — >■ {U, L). 

For A 1, the game is generic and the number of equilibria is odd. For A = 1, 
the game is non-generic and the number of equilibria is even. 
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4.12 The payoff table is 





A 


K 


Q 


A 


0,0 


5,-5 


-5,5 


K 


-5,5 


0,0 


5,-5 


Q 


5,-5 


-5,5 


0,0 



This game is just an affine transformation of “Rock-Scissors-Paper” so the 
unique equilibrium is [a* ,a*] with a* = |) (see problem 4.7). 

4.13 Take each of the sixteen possible cases, one-by-one. 



(d-b) 


Sign of 

(a — c) (d — c) 


(a-b) 


Pur e-strategy 
equilibrium 


Mixed-strategy 

equilibrium 


+ 


+ 


+ 


+ 


None 


* * 
p 


+ 


+ 


+ 


- 


(U,L) 


None 


+ 


+ 


- 


+ 


(D,R) 


None 


+ 


+ 


- 


- 


Inconsistent 


+ 


- 


+ 


+ 


{D,L) 


None 


+ 


- 


+ 


- 


(D,L) 


None 


+ 


- 


- 


+ 


(D,R) 


None 


+ 


- 


- 


- 


(D,R) 


None 


- 


+ 


+ 


+ 


{U,R) 


None 


- 


+ 


+ 


- 


{U,L) 


None 


- 


+ 


- 


+ 


iU,R) 


None 


- 


+ 


- 


- 


(U,L) 


None 


- 


- 


+ 


+ 


Inconsistent 


- 


- 


+ 


- 


{D,L) 


None 


- 


- 


- 


+ 


{U,R) 


None 


- 


- 


- 


- 


None 


* * 
p 



Where 



n* = and o* = ~ 

^ (d - c) + (a - fe) ^ (d - &) + (a - c) ■ 

The two inconsistent cases arise because d — b > 0 and a — c > 0 imply that 
a — b — c + d > 0, whereas d — c < 0 and a — b < 0 imply a — b — c + d < 0 (and 
vice versa). 



4.14 The game can be represented by the pair of payoff tables shown below. 
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Pa 



L 


A 


B 


U 


1 , 1,0 


2 , 2,3 


D 


2 , 2,3 


3 , 3,0 



Pa 



R 


A 


B 


U 


-1,-1, 2 


2,0,2 


D 


0,2,2 


1,1,2 



By inspection, there are no pure strategy equilibria. Let player 1 choose U 
with probability p, let player 2 choose L with probability q, and let player 3 
choose A with probability r. The three equations that must be satisfied for 
mixed strategies are 



7Tl(f/, CT2,0-3) = -7ri(P), 0-2, 0-3) 

7T2(cri,P,(T3) = 7r2(cTl,P, (T3) 

7 T 3 (o-i, 0-2, 7 l) = 7 r 3 (cri,CT 2 ,P). 

These yield the following three conditions for p, q, and r: 

2 {q + r — qr) — 1 = 0 
2 {p + r — pr) — 1 = 0 
3 {p + q- 2 pq) = 2 . 

The first two conditions tell us that p = q. Because the third condition has 
no real solutions A p — q, player 3 cannot employ a mixed strategy. Suppose 
r = 1, then either of the first two conditions produces the contradiction 1 = 0. 
If r = 0 , then we deduce that p = q = | . 



Chapter 5 

5.1 The equilibrium can be written unambiguously as {AEE, CR). 

5.2 The second player’s strategies are triples XY Z meaning “play X after L, Y 
after M and Z after R” . The backward induction solution is (L,BBB) and 
the payoff table is 



P2 





AAA 


AAB 


ABA 


ABB 


BAA 


BAB 


BBA 


BBB 


L 


0,0 


0,0 


0,0 


0,0 


6,2 


6,2 


6,2 


6,2 


M 


1,3 


1,3 


5,4 


5,4 


1,3 


1,3 


5,4 


5,4 


R 


6,2 


1,3 


6,2 


1,3 


6,2 


1,3 


6,2 


1,3 



where the pure strategy Nash equilibria are shown in bold type. 



5.3 The game has two trees as follows 
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Both players should “squeal” just as they do in the static game. 
5.5 The strategic form is 



P2 





LH 


LT 


RH 


RT 


AH 


1,-1 


-1,1 


1,-1 


-1,1 


AT 


-1,1 


1,-1 


-1,1 


1,-1 


BH 


3,1 


3,1 


-1,2 


-1,2 


BT 


3,1 


3,1 


-1,2 


-1,2 



The mixed strategies = \AH + \AT and ctJ = \RH + ^RT give 
7 ti((Ti,(T 2) = 7r2(o-i,(T2) = 0. Because ■k\{AH,G 2 ) = 7ri(ylT, ctJ) = 0 and 
■Ki{BH,a 2 ) = 7ri(BT, ctJ) = -1, we have 7ri(o-^,cr|) > tti ( cri , crj ) Vcti G Si. 
Because ^ 2 ( 0 - 1 , 52 ) = 0 Vs 2 G S 2 , we have 7T2 ((Ji,(T2) > 7T2((7i,(72) V(T 2 G S 2 . 
Hence (cr*,cr 2 ) is a Nash equilibrium. 



5.6 The Newcomer’s pure strategies are to “enter the market” (E) or to “stay out” 
(S'). The Incumbent’s pure strategies are to “engage in a price war” (IT) or to 
“accept the competition” (A). The game tree is 




and {E, A) is the unique behavioural strategy equilibrium. 
The strategic form is 





IT 


A 


S 


2,6 


2,6 


E 


1,1 


3,3 



Let p = P(Newcomer plays S) and q = P(Incumbent plays IT), then 
7Tjv(o-Af , o-/) = 3 - 2g + p{2q - 1) 



and 



ni{aN, (T/) = 3 + 3p + 2q{p - 1) 
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So the best responses for the Newcomer are 



O'/ 



(0.1) 


if g < 1 


(1,0) 


if > 1 


(*, 1 — *) with X € [0, 1] 


if = 1 


r the Incumbent are 


(0,1) 


if p < 1 


{y, 1 - y) with y e [o, i] 


if p = 1 



So the Nash equilibria are [E, A) and (S,a*) where a* = (y, 1 — y) with y > |. 

5.7 In the subgame beginning at the right-hand decision node, player 2 will always 
choose R. The simultaneous decision snbgame beginning at the second player’s 
left-hand decision node has 3 Nash equilibria: (a) (C,C), (b) {D,D), and (c) 
{o', a) where a = (|, |). These yield payoff pairs (a) (3,1), (b) (1,3), and 
(c) (1.5, 0.5). So there are three subgame perfect Nash equilibria: {AC,CR), 
{AD, DR), and {Bo,oR). 

5.8 Nash equilbria for the simultaneous choice subgame are {A, a), {B,b), and 
{o*,o*) with o* = (|, |). Because TVi{o*,o*) — 2 for i = 1,2, the subgame 
perfect Nash equilibria are {Ra,A), {Rb, B) and {Lo*,o*). 

5.9 The Nash equilibria for the simultaneous decision subgame are {A, a), {B,b), 
and { 01 , 02 ) with o* = (1/4, 3/4) and 02 = (1/6, 5/6). Payoffs for the mixed 
strategy equilbrium are '?ri(o'i , o’!) = 5/6 and t: 2 {oi, 02 ) ~ 3/4. Therefore, the 
three subgame perfect Nash equilibria are {Loi , ro^), {LB,rb), and {RA,£a). 
The equilibrium {RA, ta) is supported by the following forward induction ar- 
gument. If player 1 plays R, then player 2 would reason that player 1 will 
play A in the simultaneous subgame because that is the only way they will 
get a payoff greater than 4. So player 2 knows that they should coordinate on 
the Nash equilibrium (^4, a) in that subgame. Because player 1 “knows” that 
player 2 will reason in this way, they will indeed play R with the expectation 
of receiving 5 rather than the 4 that would be achieved for playing L. 

5.10 Player 2 should choose L if their payoff for doing so exceeds the expected payoff 
for choosing R. That is, they should choose L if and only if 2 > 3e -I- (1 — e), 
which reduces to e < |. (i.e., e = |.) 



Chapter 6 

6.1 Payoffs are 



T^i{qi,q2) = qi 

so the best responses are 



Po 1- 



gi + q2 

Qo 



Qo 






and 


Qo , 

P2 = — 


fl- — 




2 


\ Qo 


Po) 




2 ' 


1 Qo 


PoJ 
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Therefore, Nash equilibrium strategies are found by solving the simultaneous 
equations 



* 


Qo 


f. qi 


Cl 


qi = 




V Qo 


^0 


* 


Qo 


( . q* 


C2 


12 = 


2 


V Qo 


Po 



which gives 

* Qo ( , 2ci — C2 

For this to be a Nash equilibrium, we must also have > 0 and gj > 0, which 
implies that we must have 2ci — C 2 < Po and 2c2 — ci < Po- Suppose that 
0 < Cl, C 2 < l-Po, then these conditions are satisfied. So the pair of quantities 
ql and given above is indeed a Nash equilibrium. On the other hand, if 
2 c2 > Pq + ci then qj < 0 by the formula above, so it cannot be part of a Nash 
equilibrium. In this case, the Nash equilibrium is 



* 


Qo 


fi-l- 

V Po 


qi 


= ^ 


* 

12 


= 0 





because the qt given above is the best response to §2 = 0 from the equation 
for qi, and, given that Firm 1 is producing this quantity, the payoff for Firm 
2 is maximised on the domain 0 < g 2 < oo at §2 = 0. 

6.2 Because we are looking for a symmetric Nash equilibrium, assume that all the 
firms except the first are producing a quantity q and Firm 1 is producing a 
(possibly different) quantity qi. Then 

, 1 \u ( ^ (n + in-Pjq 

7Ti(<ji,(7,g, . . . ,q) = (ji Poll 

The best response for Firm 1 is then 

Qo f ^ , . . q c 

So the symmetric Nash equilibrium quantity cf is 

Qo 



q = 






_ Qo 
n + 1 

This gives a profit to each firm of 
-Ki{q ,q ,...,q ) = 



'-A 



Po 1- 



nq 

Qo 



QqPq 

(n+l)2 






— c 
2 



So lim„ 



> TTi = 0 . 
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6.3 The payoffs are 



and 



f?i(ei, 62) = -Bo(ei + ke2) + c(eo — ei) 



-82(61, 62) = So(e2 + fcei) +6(60-62). 

Each country wants to minimise its Bi, so the best response for Country 1 to 
some fixed level of pollution 62 is found by (check the second derivative) 



dBi 

dei 



(61,62) = 0 



which gives 
By symmetry 



6 l 



28r 



— ke2 ■ 






A Nash equilibrium is a pair of strategies e* and 62 that are best responses to 
each other. Using symmetry, we must have ej = 62 = e* where 

* 6 



28 o 



— ke 



28 o(l + fc) 

At equilibrium, the total amount of pollution in each country is 

which is independent of the amount of pollution coming from the adjacent 
country (i.e., the more one country affects the other, the larger is each country’s 
abatement) and the total amount of pollution is high when costs of cleaning 
up are high. 



6.4 The aggregate quantity produced in the Stackelberg model is 

Qs = gi‘ + 92* = |Qo ( 1 -;^) 

which is larger than the aggregate quantity produced in the Cournot model 

Qc = 2qh = IQo (1 - 

which implies that the market price of a single item is smaller in the Stackelberg 
model than in the Cournot model. 



6.5 In the event that the Entrant does diversify, the incumbent will make a greater 
profit if it reveals its production levels (i.e., in a Stackelberg duopoly), so the 
Entrant will be a market follower and will make a profit 



* 

71-2 = 



PqQo 

16 




The Entrant will diversify if potential profits exceed the cost of entering the 
market: 



PoQo 




> Ce . 



16 
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6.6 The result can be found by explicit calculation of the expected payoff 



noo r rx poo 

7Ti(o-t,cr2) = / p{x) / {v - y)q{y)dy - X q{y)dy 
Jo Lio Jx 

or by observing that 7 ti(0,(t|) = 0 together with the condition 

d 



dx . 



dx 



m{x, cTj) = 0 



implies that niix, nj) = 0 Vx. Hence 

7Tl(CTl,Cr2) = 0 

for all (Ji including ai = (j*. 

6.7 Because the analysis in the text was done in terms of costs, we still have 

1 



p[x) = - exp (--) ■ 
V \ V J 



But 



Pit) = Pix)'^ 



2kt 



W 



exp - 



which is not exponential and which, in turn, leads to a non-exponential distri- 
bntion of contest durations. 



Chapter 7 

7.1 Ignoring a common factor of 



the payoff table is 






Firm 2 





M 


c 


M 


1 1 


5 5 


8’8 


48 ’36 


C 


5 5 


1 1 


36 ’48 


9’9 



Setting 

15 5 1 

"=8 "=48 ^=9 

we see this game has the structure of a Prisoners’ Dilemma because t > r > 
p > s. 
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7.2 Payoff table is 



P2 





CC 


CD 


DC 


DD 


cc 


6,6 


3,8 


3,8 


0,10 


CD 


8,3 


4,4 


5,5 


1,6 


DC 


8,3 


5,5 


4,4 


1,6 


DD 


10,0 


6,1 


6,1 


2,2 



All strategies are dominated except DD for both players, so {DD, DD) is the 
unique Nash equilibrium (not just in pure strategies). 

7.3 Comparison of 7 ti(st,st), 7ti(sc,st), and 7 ti(s_d,st) leads to <5 > | as in 
Example 7.4. But we must also compare tt\[st,st) and 7ri(syi,ST). Because 

7Ti(s^, st ^ = 5 + 0 + 5 ^ +0 + 55 +... 

5 

1-52 



we require 5 > |. 



7.4 Ignoring the (irrelevant) common factor of (1 — 5) ^ the payoff table is 

Player 2 



Player 1 





Sd 


sc 


SG 


Sd 


1, 1 


5,0 


5 - 45,5 


Sc 


0,5 


3,3 


3,3 


SG 


5,5 - 45 


3,3 


3,3 



The pair (sd, sd) is always a Nash equilibrium because 5 < 1. The pair {sa, so) 
is a Nash equilibrium if 3 > 5 — 45, which reduces to 5 > For 5 < |, only 
So is undominated, so [sd,sd] is the unique Nash equilibrium in this case. For 
5 > |, the game is not generic. 

Let ai = {p,q,l — p — q) and ct 2 = (r, s, 1 — r — s). Then 

7 Ti (cri , (J 2 ) = 3 — (3 — 5)r + p[2 — r — 45 + 3r5 + 4s5] — q\rS\ 

So the best responses are (with p,q € [0, 1] and p + q = 1) 






' Sd 

(p, 0, 1 - p) 

SG 

Sd 

{pA, 1 - p- <j) 

(0,<j, 1 - (?) 



if r > 0 and 
if r > 0 and 
if r > 0 and 
if r = 0 and 
if r = 0 and 
if r = 0 and 



2 -45 + (35 - 
2 - 45 + (35 - 
2 - 45 + (35 - 
s > 1 - (25)-i 
s = 1 - (25)-^ 
s < 1 - (25)-i 



l)r + 45s > 0 
l)r + 45s = 0 
l)r + 45s < 0 



a 2 is similar with p r and g -o- s. Hence the Nash equilibria are 
a) Always defect: (sd,sd). 
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b) A specific mixture of defection (su) and conditional cooperation (sc): 
{{x, 0, 1 — x), {x, 0, 1 — x)) with X = (4(5 — 2)/(3(5 — 1). 

c) A set of mixtures of conditional {sg) and unconditional cooperation (sc): 
((0, q,l — q), (0, s, 1 — s)) with <?, s < 1 — (2(5)“^. 

7.5 Changing from sc to sb will not increase the payoff for either player. Only 
changing to sa has the potential to do this. The critical value of 5 for player 
1 is given by the inequality 



7Tl(sc, Sc) 


> 


7Ti(sa, Sc) 


2 


> 


3 + 


1-5 




1-5 


<S=^ 5 


> 


1 


The critical value of S for player 2 is 


given 


2 ■ 

by the inequality 


7T2(sC, Sc) 


> 


7T2(sc, Sa) 


3 


> 


r 25 

5 H r 


1 - 5 




1-5 


^ 5 


> 


2 



Because the actual discount factor is common to both players, we must have 



5 > max 




2 

3 



for (sc, Sc) to be a Nash equilibrium. 

7.6 Let St denote the Tit-for-Tat strategy. Now consider a stage t when player 
2 defects but player 1 does not. In stage t + 1, the Nash equilibrium (st, st) 
specifies that player 1 should defect and player 2 should cooperate. These be- 
haviours are then reversed for stage t-1-2, and so on. So in the subgame starting 
at stage t -I- 1, player 2 uses the strategy st and player 1 uses the cautious 
version of Tit-for-Tat (sa), which begins by defecting rather than cooperating 
(see Exercise 7.3). Because 5 > 2/3, {sa, st) is not a Nash equilibrium for the 
subgame starting at stage t -I- 1, as 

7Ti(sa,St) = 



< 



for (5 > |. 

7.7 Because the strategy sp only depends on the behaviour of the players in the 
previous stage, we consider the possible behaviours at state t— I and examine 
what happens if player 1 deviates from sp at stage t. (The game is symmetric 
so we don’t need to consider player 2 separately.) 

Consider the case when one of the players has used D at stage t — 1 and the 
other has used C (it does not matter which). Then sp specifies using D in 



5 -b 55^ -b 5(5"* -b . . . 
5 

l-(52 

3 

l-<5 

7Ti(st, St) 
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stage t (for both players). The total future payoff to player 1 (including that 
from stage t) is then 

, 4(5 

7Tl(Sp,Sp) = 1 + . 

Suppose that, instead of using D, player 1 uses C in stage t and then reverts 
to sp for stages t + 1 onwards. Let us denote this strategy by s' . The total 
future payoff to player 1 is then 



, , N „ e 45^ 

7Tl(s , Sp) = 0 + (5 + . 

1 — 0 

Player 1 does not benefit from the switch if 7ri(sp,sp) > 7ri(s',sp), which is 
true for all values of 5. 

Consider the case when both players have used D at stage t — 1 or both have 
used C (it does not matter which). Then sp specifies using C in stage t (for 
both players). The total future payoff to player 1 (including that from stage t) 
is then 

7Tl(sp,Sp) = ^ . 

Suppose that, instead of using C, player 1 uses D in stage t and then reverts 
to Sp for stages t + 1 onwards. Let us denote this strategy by s" . The total 
future payoff to player 1 is then 



/ " \ c I ^ I 4<5 

7Tl(s , Sp) = 5 + (5 + . 

1 — (5 

Player 1 does not benefit from the switch if 7ri(sp,sp) > 7ri(s”,sp), which 
is true if 4 + 4(5 > 5 + (5. Consequently, {sp,sp) is a subgame perfect Nash 
equilibrium if 5 > |. 

7.8 A suitable stochastic game is shown below. 



P2 




State — X 



P2 




State — y 



The game starts in state x and the Markov equivalent to Gg is the pair {a{x) = 
C,a{y) = D} (which we will write as CD). 

In state y, the effective game is just the Prisoners’ Dilemma itself so the equi- 
librium is for both players to use D. So we have 



In state x, the effective game is 
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P 2 





C 


D 


c 


3 + 57Ti (a;) ,3 + 57r| (a:) 


<571-1 (y), 5 + S-Kliy) 


D 


5 + 5'Kl(y),5-K2(y) 


1 + STvt(y),l + Sn^iy) 



or 



P 2 





C 


D 


C 


3 + 57Ti(a;),3 + 57rJ(a;) 


^ ^_L ^ 

1-5 ^ 1-5 


D 


^ 1 ^ ^ 

^ ^ 1-6 ’ 1-5 


1 1 
1-5 5 1-5 



Clearly (D, C) and (C, D) are never an equilibrium of this effective game, and 
[D, D) is an equilibrium for all values of 5. The pair of actions {C, C) is an 
equilibrium if (for i = 1,2) 



3 + STTi(x) > 5 + 



1-5 



which means that we would have 



Ti(®) > 



2-5 



5(1-5) • 

Suppose that both players choose cooperation in state x. Then 

TTi(x) = 3 + 5-7ri(a:) 

3 



i(x) = 



1-5 



Now 



> 



2-5 



5 > 



1 



1 - 5 - 5(1 - 5) ' ' “ - 2 
so (CD, CD) is a Markov-strategy Nash equilibrium if 5 > 



Chapter 8 



8.1 (a) There are two obvious ways: (i) 100% of the population uses the mixed 

strategy (^, ^); (ii) ^ of the population use the pure strategy (1,0) and % 
use the pure strategy (0, 1). There are many possible, less obvious alternatives. 
For example, | of the population uses the mixed strategy (|, |) and | use the 
mixed strategy (4, I). 

(b) X — — (i 0 + — (i - O) = (% — — ) 

^ 10 2J ^ 10 \4> 4’"/ l20’ 20’ 20/' 



8.2 Candidate ESSs are 

aw '■ Everyone uses W, then x = 1 and 7 t(W, 1) > 7t(L, 1). 
aL : Everyone uses L, then a; = 0 and -k{L,0) > 7r(W, 0). 
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fjm : A mixed strategy in which W is used | of the time, then a: = | and 
Now Xe = (p* + e{p — p*), 1 — p* — e(p — p*)). So 

Sit = 7r(cr*,Xe) — 7r((T, Xe) 

= P*-k{W, Xe) + (1 - P*)t^{L, Xe) - P-k{W, Xe) - (1 - p)7t(L, Xe) 

= (P* -P) (7r(VK,Xe) - 7r(L,Xe)) 

= (p* - p)(4p* - 3 - 4e(p* - p)) 

Taking each of the candidate ESSs in turn, we have 

aw : p* = 1, so Stt = (f — p)(l — 4e(f — p)) >0 Vp 1 and for e < e = |. 
So aw is an ESS. 

aL ■ p* — 0, so 5-k = p(3 — 4ep) >0 Vp / 0 and for e < e = |. So ai, is an 
ESS. 

am ■ p* = \, so &-K = — 4e(| — p)^ <0 Vp / | and Ve > 0. So am is not an 
ESS. 



8.3 (a) Each female child gets 1 mating with n offspring per mating and each male 
child gets (1 — p)/p matings with n offspring per mating. So the expected 
number of grandchildren for a female using <t = (p, 1 — p) is 



0.8^*^ ^ + 



0.2^ +(l-p) (^0.2 



f - p 



+ 0.8 



which simplihes to 



n 

~5 



-3^^3p^l-2p 



(b) Because P = | cannot be produced by either pure strategy, it must be 
produced by a mixed strategy. If this mixed strategy is an ESS, then its pay- 
off must be independent of p (or, equivalently, the payoffs to the two pure 
strategies must be the same). From the expression given above, this requires 

p= |. 

(c) The sex ratio produced by a strategy cr = (p, f — p) is 



p = 0.8p + 0.2(1 -p) = ^(l + 3p) . 

So to produce a sex ratio of p = | we must have P* = | and a* = (|, |). To 
prove that a* is an ESS, we need to check that 



■7r(cr*, Xg) > 7 t((T, Xg) ^a a* 



where Xg = {l — e)a*+ea. Let a — (p, 1— p) with p ^ p* . Then Xg = (pg, 1— pg) 
with Pg = (1 — e)p* +ep, which leads to a proportion of males in the population 



Now, 



= g(l + 3pg) 



(P* 




> 7r(cr,Xg) 

> 0 



However, if p > 0.5 then pg > 0.5 and, hence, fXs > 0.5. Conversely, if p < 0.5 
then Pg < 0.5 and, hence, pg < 0.5. So the inequality is satisfied for any p ^ p* , 
and hence cr* is an ESS. 
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8.4 Let p be the probability of playing H, then a = {p,l — p), a* = (1,0) and 
Xg = {1 — e + ep,e — ep). 

TT{a*,Ue) = (1 -e + ep) +e(l -p)n 

j + J5£(l - p)v + e(l - p) 2 

So 

7r(cr*,Xe) -7r(cr,Xe) = (1 - p) + e(l - p) |] 

> 0 Vp / 1 

because v > c. 

8.5 Let p be the probability of cooperating (i.e., playing C), then a = (p, 1 — p), 
a* = (0, 1), and Xg = (ep, 1 — ep). Then 

7r(cr*,Xe) = l+4ep 

7r(cr,Xg) = (1 -p) +ep(4 -p) 

So 

7r(cr*,Xe) - 7r(cr,Xe) = p(l+ep) 

> 0 Vp / 0 

8.6 (a) The three pure strategies R = (1,0,0), G = (0, 1,0), and B = (0,0, 1) are 
all ESSs. The mixed strategy Nash equilibrium with a* = (|, |, |) is not an 
ESS. 

(b) G = (1, 0) is the only ESS, because 7 t(G, G) > ^{H, G). 

(c) A = (l,0) and B = (0, 1) are both ESSs. The mixed-strategy Nash equi- 
librium a* = (p*, 1 — p*) with p* = I is not an ESS because 

7t((7*, a) — 7t(ct, a) = — 5(p — p*)^ . 

(d) The strategy a* — (2/3, 1/3) is the unique ESS because 

7t((T*, a) — 7r(o-, cr) = 4 — 12p — 9p^ 
which is positive for all p | . 

8.7 (a) The payoff table is 





Ti 


T 2 


Ti 


2,2 


1,1 


T 2 


1,1 


2,2 



By inspection, (Ti, Ti) and (T 2 , Tb) are symmetric Nash equilibria. Find mixed 
strategy Nash equilibria using the equality of payoffs theorem. 

= 7ri(B, crj) 

= q* + 2{l-q*) 

1 

~ 2' 

By symmetry p* = | , so the mixed strategy N ash equilibrium is((|,|),(|,|)). 
(b) Both Ti and T 2 are ESSs for the following reasons. Let a = (p, 1 — p), so 
Ti corresponds to p = 1 and T 2 corresponds to p = 0. 



7 Tl (^,0-2) 
2q* -I- (1 - q*) 

q* 
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— 7r(Ti,Ti) = 2 and Tv{a,Ti) — 1 + p. Hence tt{Ti,Ti) > •K{a,Ti) Vp / 1. 

— 7r(T2, Tb) = 2 and 7r(cr, T 2 ) — 2 — p. Hence tt{T 2 , T 2 ) > T 2 ) Vp / 0. 

The mixed strategy a* = (|, |) is not an ESS because (for example) 

7t(ct*, (7*) = 7r(Ti, a*) = ^ Ma ^ a* 



but 7r(Ti, Ti) = 2. 

8.8 Let w, X, y, and 2 be the probabilities of playing HH, HD, DH, and DD, re- 
spectively, at a mixed strategy Nash equilibrium [a* , a*] with a* = (w, x, y, z). 
Then 



n{HH,a*) 

Tv{HD,a*) 

n{DH,a*) 

■K(DD,a*) 



—2w + X + y + z 
—w -I- 2* -I- 32 
—w -I- 2p -b 32 
X + y + 2z 



Equating these payoffs in all possible combinations and using the constraint 
w + x + y + z= 1 gives w = z, x = y and w + x = | . Hence 

a*) = ty[HD, a*) — tv{DH, a*) — Tr{DD, a*) = Tr{a* , a*) = 1 . 

Now HD) = w + 2x + z = l but -k{HD, HD) = 2 so <t* is not an ESS. 



8.9 (a) This game has no ESSs, because the payoff is the same for all possible 
strategies. 

(b) There are no pure-strategy ESSs. The symmetric mixed-strategy Nash 
equilibrium has u* = (|, |). Because 7r(o-*, a) = ^ + p and 7r(o-, cr) = 3p — 2p^ 
(where p is the probability of playing E), we have Tr{a*, a) > 7r(a, a) Vp |. 

(c) There are no pure-strategy ESSs. The mixed-strategy Nash equilibrium 
with a* — (|, I) is also not an ESS because 7r(CT*,cr) = 7 t(ct, u) = 0 Vcr. 



Chapter 9 



9.1 Because 



k 

^Xi 

i=l 



(7r(si,x) - 7r(x))a:i 



k k 

^(7r(Si, x)xi - 7 t(x)) ^ Xi 
i=l 2=1 

7t(x)) - 7t(x)) 

0 



the result follows. 
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9.2 Under the affine transformation 7 t(x) — >■ A7t(x) — /r, so Equation 9.1 becomes 

^ = A((7r(si,x) - Ti{x.))xi . 

Introducing an adjusted time parameter t = At, we can write this as 

^ = ((7r(si,x) - 7r(x))a;i 
dr 

which is exactly the same form as the original equation. 

9.3 Because 

7t(AI,x) = {a — b)xi+2ax2 
■7t(_B,x) = ax2 

the average payoff is 7r(x = {a—b)xi+ 2 axiX 2 +ax 2 and the replicator dynamics 
is 

xi = xi{{a — b)xi + 2 ax 2 — n{x)) 

X2 = X2{ax2 — n{x.)) . 

Clearly the populations {x\ = l,X 2 = 0) and (xi = 0,X2 = 1) are fixed points. 
At the polymorphic fixed point, we must have 

(a — b)xi + 2 ax 2 — '7r(x) = 0 = ax 2 — 7t(x) 

which gives {a—b)xi = —ax 2 - Substituting this into the equation 0 * 2 — tr(x) = 0 
gives xi = 

9.4 Let X be the proportion of iL-players, then 

|a;(l-a;) - a;) 

with fixed points a;* = 0, a;* = 1, and x* = vjc. If a; < v/c, then a; > 0 and 
if a: > v/c, then i<0. Soa;— >-w/c for any initial population that is not at a 
fixed point. 

9.5 The replicator dynamics equation for the proportion of Ti-players is 

X = a;(l - a;) (7r(Ti,x) - 7r(T2,x)) 

= a;(l — a;)(2x — 1) . 

If a; > |, then a; — >■ 1 and if a: < then a: — >■ 0. From Exercise 8.7 the ESSs 
are T\ and T 2 , which correspond to these evolutionary end points. 

9.6 When a > 0, both A and B are ESSs. For o < 0, the game has a unique ESS, 
a* = (1/2, 1/2). The replicator dynamics equation is 

X = aa;(l — a:)(2x — 1) 

with fixed points a;* = 0, x* = 1 and x* = |. 

First, consider a population near to x* = 0. Let x = x* + e = e. Then we have 
£ = ae(l — e)(2e — 1) 



—ae . 
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So the fixed point x* = 0 is asymptotically stable if a > 0 and unstable if 
a < 0 . 

Now consider a population near to x* = 1. Let x = x*— e=l — e. Then we 
have 



£ = — a(l — e)(e)( 2 (l — e) — 1 ) 

« —as 



So X* = 1 is asymptotically stable if a > 0 and unstable if a < 0. 

Finally, consider a population near to x* = Let x = x*+e = |+e. Then 
we have 



e — “(2 + “ 2 ~ ~ 

1 

« -as. 

So X 3 = I is asymptotically stable if a < 0 and unstable if a > 0. 

Overall a fixed point is asymptotically stable if and only if the corresponding 
strategy is an ESS. 

9.7 The fixed points are (1,0,0), (0, 1,0), and (0,0, 1) in both cases. 



9.8 The replicator dynamics equations are 

X = x(l + 2x — {/ — 7t(x, j/)) 
y = y('i--x + 2y-^{x,y)) 



with 

7 t(x, y) — 1 + 2 x^ + 2y^ — 2xy . 

The fixed points are (0, 0), (0, 1), (1, 0), and (|, |). The points (0, 1) and (1, 0) 
are stable nodes (eigenvalues —2 and —3 in both cases). The point (|, |) is a 
saddle point (eigenvalues | and — | with eigenvectors x -\- y — 1 and x = y, 
respectively). The point (0,0) is non-hyperbolic. On the invariant lines x = 0, 
y = 0, and x — y, the population moves away from (0,0). So a qualitative 
picture of the replicator dynamics looks like the figure below. 



( 0 , 1 ) 
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9.9 Let X be the proportion of A-players and let y be the proportion of B-players. 
Set X = I + ^ and y ~ ^ + y and linearise about the fixed point x* = y* = ^ 
to get 




with 




The eigenvalues of the matrix L are both — | . Hence the fixed point is asymp- 
totically stable. Because a* = |) is a mixed Nash equilibrium strategy, 

we have, for a = (0, |, |), 



7v{a,a*) = 7v(a*,a*) = 

But 




< 1 

= 7r(cr,cr) 



SO a* is not an ESS. 



9.10 The payoff table is (c > 0) 





R 


S 


p 


R 


— c, — c 


1,-1 


-1,1 


S 


-1,1 


— c, — c 


1,-1 


P 


1,-1 


-1,1 


— c, — c 



Let X, y, and z be the proportions of R-, S-, and P-players. Then the replicator 
dynamics system is 



X = x{—cx + y — z — 'K{:K}) 
y = y{-x - cy + z - n(x)) 
z = z{x — y — cz — ■Tf(x)) 



with 7t(x) = —c{x^ + J/^ + It is easy to check that the point x = y = z = ^ 
is a fixed point. Let V be the relative entropy function, then 



dV 

dt 



= - [7t(o-*,x) -7t( 



2 , 2 , 2x 



= :^-c{x +y + z 

< 0 forx/(i,i,l 






Further Reading 



Part I 

A detailed, technical exposition of utility theory is given by Myerson (1991); 
Allingham (2002) provides a more conceptual account. The philosophical back- 
ground to rational behaviour is explored by Hargreaves Heap & Varoufakis 
(1995). Grafen (1991) gives a biological introduction to modelling animal be- 
haviour, and the mathematical foundations for the concept of fitness are dis- 
cussed by Houston & McNamara (1999). Markov decision processes are covered 
in detail by Ross (1995) and Puterman (1994). Biological applications of such 
processes are discussed by Mangel & Clark (1988). 



Part II 

Myerson (1991) and Fudenberg & Tirole (1993) give theoretical introductions 
to game theory and include many ideas not covered in this book. Good sources 
of game-theoretic models include Gibbons (1992), Gintis (2000), and Romp 
(1997). Brams (1983) provides a highly unusual, but thought-provoking, ap- 
plication of game theory. Game-theoretic models with continuous strategy sets 
are discussed by Gabszewicz (1999) and Martin (1993). The various Nash equi- 
librium refinements are discussed in detail by van Damme (1991). Stochastic 
games are covered by Filar & Vrieze (1997). 
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Part III 

The classic text on evolutionary game theory was written by Maynard Smith 
(1982). More recent, biologically-oriented texts include those by Dugatkin & 
Reeve (1998) and Houston & McNamara (1999). The evolution of the Social 
Contract is discussed by Skyrms (1996). Replicator dynamics and other forms 
of evolutionary dynamics are covered by Weibull (1995), Vega-Redondo (1996), 
and Hofbauer & Sigmund (1998). Young (1998) emphasises stochastic evolu- 
tionary dynamics. 
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